Extreme value models are used for estimating engineering design forces of extreme events like earthquakes, winds, rainfall, floods, etc, (???). Structures designed with these forces, holding a balance between safety and cost, will survive while being requested by an extreme event from a natural phenomenon (???).
This research presents an application of extreme value analysis to estimate wind velocities for infrastructure design. Consequently, the main interest are probable future extreme events that structures need to be able to resist (???).
This research follows the methodological approach defined in chapter 26 of the ASCE7-16 standard (???). ASCE7-16 considers design wind velocities for various mean recurrence intervals MRIs, depending on the risk category of the structure, as follows: MRI=700 years for risk category (RC) I and II, 1700 years for RC III, and 3000 for RC IV. A wind speed linked to a mean recurrence interval - MRI of N-years (N-years return period) is interpreted as the highest probable wind speed along the period of N-years (???). The annual probability of equal or exceed that wind speed is 1/N, this is with a change of being equaled or exceeded only one time in the corresponding MRI period.
The development of this research (focused in non-hurricane data), covers three main areas, downscaling support, temporal analysis, and spatial analysis, and includes an integration process with existing results of hurricane studies.
Due to the specific characteristics of the study area where there is lack of historical wind measurements, it became necessary to look for alternative data sources: ISD, and ERA5 forecast data. This resulted in a downscaling issue that was confronted from a graphic comparison of all sources by matching stations, in the search of adequate downscaling support for the use of complementary data. Prior to the comparison process, ISD and IDEAM data sources were standardized to represent 3-second wind gust, 10 meters of anemometer height, and terrain open space roughness.
The temporal analysis method used to calculate the return levels at each station from the historical wind time series, is the Peaks Over Threshold POT using a non-homogeneous bi-dimensional Poisson Process PP recommended by (???) and developed in (???). Main components of POT-PP model are de-clustering, thresholding, intensity function fitting, hazard curve, and return levels calculation. At each station with non-thunderstorm data, this model starts with a process of de-clustering choosing a suitable threshold level to leave for the analysis only the most extreme available values, and then, fit to the data an intensity function using maximum likelihood to find optimal parameters with the best goodness of fit. With the fitted model, and using the hazard curve, it was possible to calculate extreme wind velocities or return levels for required MRIs.
The integration of all these results allow to generate non-hurricane continuous maps of extreme winds velocities (using Kriging as spatial analysis method), which are combined with existing wind extreme hurricane studies to be used as input loads for the design of structures of different risk categories, i.e., less risky/important structures for short MRIs (700 and 1700 years), and highly important structures for the longest MRI of 3000 years.
To design a specific structure, horizontal forces (wind and earthquake) play an starring role. For Colombia, initially, wind forces are calculated considering a fixed velocity value of 100 km/h, later, a continuous map with a return period of 50 years was included in the official design standard. Afterwards, an additional map with a return period of 700 years was added (???).
In the context of this study, extreme wind analysis is concerned with statistical methods applied to very high values of wind velocity as random variable in a stochastic process, to allow statistical inference from historical data. Extreme analysis methods assess the probability of wind events that are more extreme than the ones previously registered and included in the input model of the maximum wind velocities ordered sample. (???) presents a detailed study about classical extreme value theory and threshold models. Asymptotic extreme value models arguments give a convenient representation of the stochastic behavior of maximum values (???).
According to (???), there are four main elements needed for a good analysis of extreme values: (a) appropriate selection of an asymptotic model; (b) use of all pertinent and available information; (c) properly estimation of uncertainty; and (d) considering non-stationary effects.
In general, there are two approaches to deal with extreme value analysis (???), the classical approach, and peaks over threshold POT. In this research, POT is selected over classical approach to be able to use more samples for statistical estimation.
The classical approach or traditional method, as well known as sample maxima or yearly maxima is associated to a generalized extreme value distribution GEV (???; ???). GEV is a family of limit probability distributions including Gumbel, Fréchet and Weibull, unified in (???; ???). The GEV family describes all limiting distributions of the centered and normalized sample maximum (???).
POT models the values above a chosen high level, and in general, the POT method has two approaches (???): (a) exceedances over threshold associated to a Generalized Pareto Distribution GPD, onwards POT-Poisson-GPD; and (b) the exceedances over threshold associated to a non-homogeneous non-stationary bi-dimensional Poisson process POT-PP (a point process approach). POT-PP is considering to be more flexible than generalized Pareto approach (???), so for this study, POT-PP method is selected.
Selection of threshold level is relevant in POT. A low threshold (more exceedances) implies less variance and weak asymptotic support, but high bias. A high threshold (fewer exceedances) implies more variance and stronger asymptotic support, but low bias.
POT-Poisson-GPD models wind magnitudes over the threshold as a GPD, and time as a separated Poisson process (???). This method was used for the first time as statistical application in (???). The generalized Pareto family describes all possible limiting distributions of the distributions scaled above the threshold (???).
In POT-PP time and magnitude above the threshold are modeled using a two-dimensional Poisson process (???). This method was applied for the first time in (???).
There are many techniques to estimate the parameters of extreme value models, i.e. graphical methods, estimators based on moments, order statistics, and likelihood based (???). (???) supported the use of likelihood methods due to asymptotic normality guaranteed when shape parameter is greater than 0.5, excluding light tailed distributions with finite end point. In addition, likelihood method is easy to evaluate and solve numerically, and the calculation of standard errors and confidence intervals is possible using asymptotic theory.
Wind forces are important for infrastructure design (???). For a civil engineer main forces to consider for the design of a structure, for instance a bridge or a building, are (a) dead load due to the weight of the structure, and (b) live load due to earthquake and wind. For Colombia, the structure design standard has defined in great detail, all aspects related to seismic forces, and dead forces, but lack of detail in design wind velocities. Current wind velocities map is 20 years outdated, and is not appropriate for all types of structures, because it only includes two return periods.
It is well known that in recent years there have been accelerated changes in the climate of the planet, including issues related to winds. This aspect is reflected in frequent partial failures of structures due to wind forces (???), and in some cases including with total losses (???). Last five decades the way to assess wind loads in structural design has had remarkable changes (???).
A complete study of extreme wind forces, need to address separately hurricane and non-hurricane data, to include in the product the integration of results from both fronts (???). In the study area, hurricane winds are only present inland in the Caribbean Sea, therefore, only affects directly ‘San Andres y Providencia’ island - one (1) of thirty-three (33) states. In 1102 of 1103 municipalities (more than 99%), only non-hurricane winds are relevant. However, all municipalities located near to the northern onshore border may be impacted by side effects of hurricanes.
The national infrastructure design standard of Colombia, maintained by the Earthquake Colombian Association of Seismic Engineering AIS, uses km/h as official units for wind velocities. In this research km/h is always used, considering that output results will support the update of chapter B.6 (wind forces) of mentioned standard.
Nowadays, methodologies to deal with the inference of extreme wind maps are quite mature and advanced, and many of them are already implemented and ready for use. For this reason, the main contribution of this research is not related to the theoretical foundations of the methods themselves, but to application of the method in a particular case where good quality data is not available (???). Thereby, the gaps in which this research aims to contribute are related to the use of alternative data sources, and how to meet the downscaling challenge considering the lack of field measurement data coming from weather stations.
The main aim of this research is the estimation of wind extreme velocities to be used as input loads for the design of structures, considering their risk categories, and covering any place in the whole study area.
Specific objectives are:
Complement the lack of field measured wind data, with other sources of information, then, analyze and compare different time series, to select and use the best data source (or combination of sources), in case of downscaling support issue.
Select and apply a suitable probabilistic method to infer wind maps for infrastructure design.
Estimate extreme wind values for the stations in the selected input data source, considering non-hurricane approaches.
Allow the comparison of wind extreme values estimations, using different methods to verify and calibrate output results.
Generate continuous non-hurricane wind maps, using the most suitable spatial interpolation technique, considering the specific characteristics of the input data source.
Combine output maps from non-hurricane analysis with existing hurricane studies to obtain final maps for structural purposes.
Main question of this research is directed to calculate future wind extreme velocities (return levels) for infrastructure design, then the research is:
What wind extreme velocities need to be used as load design forces for structures of different use category, in the study area?
As mentioned before, case study in this research is Colombia a tropical country located in the northern part of South America. Its capital Bogotá is located in the center of the country at latitude \(4.6^\circ N\), and longitude \(74.1^\circ W\).
Despite the government Institute of Hydrology, Meteorology and Environmental Studies IDEAM maintains a network of weather stations, of which around 200 have anemometers measuring instant data every minute, it was impossible to obtain quality and complete measured data according to the needs of the present investigation, motivation to search for alternative data sources.
Nowadays in Colombia there are predefined requirements to design structures depending of its use category. The national standard for infrastructure design (???), following the American design standard (???), covers the design of all types of structures with the mean recurrence intervals MRIs 700, 1700, and 3000 years. In this way this research aims to calculate wind extreme velocities that will be equaled or exceeded with a probability equal to \(\frac{1}{MRI}\) in a given year, in other words, the velocities that will be equaled or exceeded only one time in mentioned periods. In terms of exposure time, understood as the time the structure will be in use, when the exposure time will be equal to those MRIs, the wind extreme velocities will have an occurrence compound probability of 67%.
Historically, hurricanes have only affected the Colombian Caribbean coast in a not very significant way, despite the fact that there have been significant events that have made landfall. The most likely areas to be affected by storms are the department of La Guajira and the island of San Andrés (???). In most cases the events that define the wind design loads in Colombia do not require hurricane data.
Main sections of thesis document are 1) Introduction, 2) Data, 3) Theoretical Framework, 4) Methodology, 5) Results and Discussion, 5) Conclusions, and 6) Annexes, from A to E.
After introduction, in second section Data, main information about data sources IDEAM, ISD, and ERA5 are described, including at the end additional details for ERA5 in Annex @ref(datadownload). Annex @ref(dbstoring) explains reasons for using PostgreSQL engine, and the database backup and restoration process.
Theoretical Framework section is dedicated to introduce statistical concepts that are basis for the investigation, both in probability distributions and in extreme analysis. Later, it is described in more detail, topics related to extreme value analysis (peaks over threshold with generalized Pareto POT-Poisson-GPD, and peaks over threshold with Poisson process POT-PP), and at the end, a summary report is done about wind load requirements for the study.
The Methodology chapter includes the processes needed to meet the objectives and answer the research question. Main components are data standardization, downscaling support, POT-PP, spatial interpolation, and integration with hurricane data.
Results and Discussion section shows, (1) all results for data standardization and comparison to support the downscaling issue, (2) all POT-PP results for one ISD station, (3) all output maps for ISD and ERA5 data sources including discussions of those finals results. These discussions are complemented by the Conclusions section.
To finalize the document, a series of appendices were created to facilitate the reproducibility of the research. Appendix @ref(rcode) contains research R code. It is necessary to considering that the code provided by Dr. Adam L. Pintar to do the de-clustering and thresholding in POT-PP is not there because its publication and distribution is not authorized. Appendix @ref(results) contains all results in digital format. Appendix @ref(datadownload) compliments the information needed to download ERA5 data. Appendix @ref(dbstoring) shows the use of PostgreSQL for data storage, and provides instructions for backup and restoring. Because the document for the thesis was done using package ´thesisdown´ (which is based in ´bookdown´) the most important document R code to create the thesis document, mainly graphics, is shown in Appendix @ref(docrcode). Finally, in Appendix F @ref(manual), a user manual is presented, in order to provide instruction to apply the same methodology in a different case study.
Input data was obtained from three different sources (a) Institute of Hydrology, Meteorology and Environmental Studies of Colombia IDEAM http://www.ideam.gov.co, (b) Integrated Surface Database ISD https://www.ncdc.noaa.gov/isd, and (c) climate reanalysis ERA5 https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5. Information about datasets, variables, and units is found in tables @ref(tab:tabledatasources1), @ref(tab:tabledatasources2), and @ref(tab:tabledatasources3) respectively.
| Institution | Dataset | Details |
|---|---|---|
| IDEAM | Historical records at weather stations | IDEAM is responsible for the installation, maintenance and management of all kind of weather stations located everywhere along the country |
| NOAA | ISD | ISD (Integrated Surface Database. NOAA’s National Centers for Environmental Information - NCEI) Lite: A subset from the full ISD dataset containing eight common surface parameters in a fixed-width format free of duplicate values, sub-hourly data, and complicated flags. |
| ECMWF | ERA5 | ERA5 is a reanalysis dataset with hourly estimates of atmospheric variables with horizontal resolution of 0.25º (33 kilometers), i.e. cells equally spaced every 0.25 degrees |
| Dataset | Variables | Description |
|---|---|---|
| IDEAM | vv_aut_2 | Instantaneous wind velocity each two (2) minutes |
| vv_aut_10 | Instantaneous wind velocity each ten (10) minutes | |
| vvmx_aut_60 | Maximum wind velocity each sixty (60) minutes | |
| ISD | v5 | Maximum hourly five seconds (5-s) wind gust velocity |
| ERA5 | fg10 | 10 meters wind gust since previous post-processing |
| fsr | Forecast Surface Roughness |
| Variable | Units | Time | Stations |
|---|---|---|---|
| vv_aut_2 | meters per second | Variable | 20 |
| vv_aut_10 | meters per second | Variable | 204 |
| vvmx_aut_60 | meters per second | Variable from 2001 until today. Irregular time series. | 203 |
| v5 | meters per second | Variable from 1941 until today. Note: There is too much variability in time (start, end, and time range) for each station. Irregular time series. | 101 |
| fg10 | meters per second | 1979-Today | 3381 |
| fsr | meters per second | 1979-Today | 3381 |
Ideal data source to create extreme wind speeds maps should be field observed data from IDEAM, but the IDEAM data have the following deficiencies:
There are not enough number of stations around the study area to represent all the local wind variability in a huge country with multiple variety of climates and changing thermal floors.
There are uncertainties related to the way anemometers are registering data, then comparison with other data sources are needed to be able to do appropriate data standardization, needed as a prerequisite for the analysis.
There is no time continuity in the registration of IDEAM data. Historical time series are different and variable in each station.
The different wind related variables provided by IDEAM are not well documented and their comparative values for identical time periods are not consistent with each other.
Other logistical difficulties in accessing IDEAM data source are also highlighted:
It was not possible to obtain the wind hourly mean variable.
It was impossible for them to calculate and deliver a representative gust velocity, for example 5-minutes gust.
There was also no access to complete raw data (instantaneous measurements every minute that they claim to have), to be able to calculate from them the variables required for the study.
Importance of ISD database for this study is based on the fact that post-processed ISD database has wind extreme values, and it was used to create extreme wind maps for United States. ISD allows comparison with IDEAM records to take better decisions in order to conduct data standardization. Despite that ERA5 data are not observed data (but forecast), its main advantage is their resolution (0.25 square decimal degrees) and availability.
Historical observed wind speeds from 203 stations in Colombia are managed by the official environmental authority IDEAM. Table @ref(tab:tableideamstations) shows a sample of five IDEAM stations. Figure @ref(fig:plotideamstations) shows a map of IDEAM stations. Figure @ref(fig:plotoneideamstation) shows data for IDEAM station “21205791”.
| Name[Code] | Latitude | Longitude |
|---|---|---|
| EMAS - AUT [26155230] | 5.09 | -75.51 |
| SAN BENITO - AUT [25025380] | 9.16 | -75.04 |
| AEROPUERTO ALFONSO LOPEZ - [28025502] | 10.44 | -73.25 |
| TIBAITATA - AUT [21206990] | 4.69 | -74.21 |
| ELDORADO CATAM - AUT [21205791] | 4.71 | -74.15 |
IDEAM Stations. Colombia
Time Series of IDEAM Station ELDORADO CATAM - AUT
ISD is a database with environmental variables, among them extreme wind speeds. ISD has data for the whole planet, and is based on observed data at meteorological stations in each country, which means that for Colombia is based on IDEAM data. Main advantage is data availability at neighbor countries and specialized post-processing made by NOAA’s National Centers for Environmental Information NCEI in United States, which facilitates its use. Table @ref(tab:tableisdstations) shows a sample of five ISD stations. Figures @ref(fig:plotisdstations) and @ref(fig:plotoneisdstation) shows a map of ISD stations and data from ISD station “802590”.
| Code | Name | Latitude | Longitude |
|---|---|---|---|
| 804400 | BARINAS | 8.62 | -70.22 |
| 800810 | ALTO CURICHE | 7.05 | -76.35 |
| 801000 | BAHIA SOLANO / JOSE MUTIS | 6.18 | -77.40 |
| 802590 | ALFONSO BONILLA ARAGON INTL | 3.54 | -76.38 |
| 803150 | BENITO SALAS | 2.95 | -75.29 |
ISD Stations. Colombia and Surroundings countries
ISD Station ALFONSO BONILLA ARAGON INTL - Time Series
ERA5 Cells and Stations (Cells Centers)
ERA5 is forecast reanalysis data processed by the European Centre for Medium-Range Weather Forecasts ECMWF with wind speeds time series in square cells of 0.25 decimal degrees covering the whole planet. It was extracted a raster of 69 rows by 49 columns in format NetCDF. Cell centers represent ERA5 stations, with IDs from 1 (lon=-79, lat=12.5) to 3381 (lon=-67, lat=-4.5). Map in figure @ref(fig:plotera5stations) shows ERA5 stations.
All data sources had different mechanisms for downloading. For IDEAM, the official procedure is through the e-mail atencionalciudadano@ideam.gov.co. For ISD all files are available in the FTP site ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/, organized in folders by years with gzip files inside; there are many files by station (one file for each year available), with names in the format ID-99999-YYYY.gz, where ID is the USAF-ISD station identifier, and YYYY is the year. ERA5 data request uses a Python scripts with data size limit for download. Files with all IDEAM and ISD stations are available in Annex @ref(rcode). For the Python code and commands to join NetCDF files of ERA5 data source see the Annex @ref(datadownload). For data organization see the Annex @ref(dbstoring).
PDF defines the probability that a continuous variable falls between two points. In PDF the probability is related to the area below the curve (integral) between two points, as for continuous probability distributions the probability at a single point is zero. The term density is related to the quantity of probability defined below each part of the curve, the higher the values of the curve, the higher the density and, consequently, the probability.
\[\begin{equation} \mathrm{ \int_a^b f(x)dx = Pr[a \leq X \leq b] } (\#eq:pdf) \end{equation}\]
Equation @ref(eq:gumbelpdf) is the Gumbel PDF.
\[\begin{equation} \mathrm{ f(x)=\frac{1}{\beta} \exp\left\{ -\frac{x-\mu}{\beta} \right\} \exp\left\{ -\exp\left\{ -\left( \frac{x-\mu}{\beta} \right) \right\} \right\}, \quad -\infty < x < \infty } (\#eq:gumbelpdf) \end{equation}\]
where \(\exp\left\{.\right\}\) is \(\mathrm{e}^{\left\{.\right\}}\), \(\beta\) the scale parameter, and \(\mu\) the location parameter. Location (\(\mu\)) has the effect to shift the PDF to left or right along ‘x’ axis, thus, if location value is changed the effect is a movement to the left (small value for location), or to the right (big value for location). Scale has the effect to stretch (\(\beta > 1\)) of compress (\(0 < \beta< 1\)) the PDF, if scale parameter is close to zero it approaches a spike.
Figure @ref(fig:plotgumbelpdffunction) shows PDF with location (\(\mu\)) = 100 and scale (\(\beta\)) = 40, using Equation @ref(eq:gumbelpdf). Figure @ref(fig:plotgumbelpdf) shows PDF with location (\(\mu\)) = 100 and scale (\(\beta\)) = 40, using function dgumbel of the package RcmdrMisc.
Gumbel PDF
Gumbel PDF - dgumbel function
CDF is the probability of taking a value less than or equal to x. That is
\[\begin{equation} \mathrm{ F(x) = Pr[X < x] = \alpha } (\#eq:cdf1) \end{equation}\]
For a continuous variable, CDF can be expressed as the integral of its PDF.
\[\begin{equation} \mathrm{ F(x) = \int_{-\infty}^x f(x)dx } (\#eq:cdf2) \end{equation}\]
Equation @ref(eq:gumbelcdf) is the Gumbel CDF.
\[\begin{equation} \mathrm{ F(x) = \exp\left\{-\exp\left[-\left(\frac{x-\mu}{\beta}\right)\right]\right\}, \quad -\infty < x < \infty } (\#eq:gumbelcdf) \end{equation}\]
Figure @ref(fig:plotgumbelcdffunction) shows Gumbel CDF with location (\(\mu\)) = 100 and scale (\(\beta\)) = 40, using Equation @ref(eq:gumbelcdf). As previously done with PDF, similar result can be achieved using function pgumbel of package RcmdrMisc.
Gumbel CDF
PPF is the inverse of CDF, also called the quantile function. This is, from a specific probability get the corresponding value x of the variable.
\[\begin{equation} \mathrm{ x = G(\alpha) = G(F(x)) } (\#eq:ppf) \end{equation}\]
Equation @ref(eq:gumbelppf) is the Gumbel PPF.
\[\begin{equation} \mathrm{ G(\alpha) = \mu-\beta ln(-ln(\alpha)) \quad 0 < \alpha < 1 } (\#eq:gumbelppf) \end{equation}\]
Figure @ref(fig:plotgumbelppffunction) shows Gumbel PPF, using Equation @ref(eq:gumbelppf). Similar result can be achieved using function qgumbel of package RcmdrMisc.
Gumbel PPF
Figure @ref(fig:plotgumbelhffunction) shows Gumbel HF, using Equation @ref(eq:gumbelhf).
Gumbel HF
HF is the ratio between PDF and SF. SF is the survival function \(S(x) = 1 - F(x)\), which defines the probability that a variable takes a value greater than x, \(S(x) = Pr[X > x] = 1 - F(x)\).
\[\begin{equation} \mathrm{ h(x) = \frac{f(x)}{S(x)} = \frac{f(x)}{1-F(x)} } (\#eq:hf) \end{equation}\]
Equation @ref(eq:gumbelhf) is the Gumbel HF.
\[\begin{equation} \mathrm{ h(x)= \frac{1}{\beta}\frac{\exp(-(x-\mu)/\beta)}{\exp(\exp(-(x-\mu)/\beta))-1} } (\#eq:gumbelhf) \end{equation}\]
In order to approach the extreme value analysis, some statistical concepts are needed to understand the theoretical framework behind this knowledge area. This section introduces the concepts annual exceedance probability, mean recurrence interval MRI, exposure time, and compound probability for any given exposure time and MRI. As a hypothetical example, a simulated database of extreme wind speed will be used. This database is supposed to have 10.000 years of wind speeds.
Using the previously described database, a question arises to calculate the probability to exceed the highest probable damage caused to any structure by the action of the winds from this simulated database. It is possible to conclude that there is only one event greater or equal (in this case equal) to the highest probable causing damage in 10.000 years, and it is the highest wind. If we sort the database by wind magnitude in descending order (small winds last), the question is solved calculating the annual exceedance probability Pe with next formula
Sorted Wind Velocities by Magnitude
The annual exceedance probability \(P_e\) equals to the ratio between event index after descending sorting and years of simulations. The highest wind will be the first in the sorted list.
\[ P_e = \frac{1}{10.000}=0.001=0.01\% \] Same exercise can be done with all winds to construct the annual exceedance probability curve, that in this case will represent the probability to equal or exceed different probable damage due to wind.
Continuing with the previous section, if the inverse of the exceedance probability is taken, the return period (in years) is obtained. The return period or Mean Recurrence Interval MRI is associated with a specific return level (wind extreme velocity). MRI is the numbers of years (N) needed to obtain 63% of change that the corresponding return level will occur at least one time in that period. The return level is expected to be exceeded on average once every N-years.
The formula to calculate the annual exceedance probability of the return level depends on the MRI value as shown below:
\[\begin{equation} \mathrm{ P_e = \begin{cases} \begin{split} &1-\exp\left(-\frac{1}{MRI}\right),\;\textrm{for MRI < 10 years} \\ &\frac{1}{MRI},\;\textrm{for MRI }\geq\textrm{ 10years} \end{split} \end{cases} } (\#eq:pe) \end{equation}\]
For a specific wind extreme event A, the probability that the event will occur in a period equal to MRI years is 63%. If we analyze for the same period a strongest wind extreme event B, its occurrence probability will be less than 67%. If the purpose of this research is to design infrastructure considering wind loads, the structure will be more resistant to wind if we design with stronger winds, this is high MRIs, and low annual exceedance probability. Common approach for infrastructure design, considering any type of load (earthquake, wind, etc.) is to choose high MRI according to the importance/use/risk/type of the structure. For highly important structures, like hospitals or coliseums, where the risk of collapse must be diminished, the MRI used to design is higher in comparison to common structures (for instance a normal house), which implies less risks for its use and importance.
Compound Probability
It is possible to calculate a compound probability \(P_n\), where \(n\) is the exposure period. The exposure period is the usage time of the structures that have been designed with an extreme wind speed. \(P_n\) is the probability that the extreme wind speed will be equaled or exceeded at least one time in \(n\) years, and in this sense it is a probability of occurrence:
\[\begin{equation} \mathrm{ P_n = 1-\left(1-\frac{1}{MRI}\right)^n,\;\textrm{occurrence probability} } (\#eq:pn) \end{equation}\]
As a complementary probability, it is possible to calculate the compound non-occurrence probability as \(\left(1-\frac{1}{MRI}\right)^n\)
If it is considering exposure time as a multiple of return period, the resulting Figure @ref(fig:compoundprobability), shows that:
The example discussed here was presented as an instrument to introduce important concepts, nonetheless, there are specialized approaches to deal with extreme value analysis which will be discussed in Extreme Value Analysis Overview.
Analysis of extreme values is related with statistical inference to calculate probabilities of extreme events. Main methods to analyze extreme data are sample maxima, and Peaks Over Threshold POT. The sample maxima method also known as epochal or block maxima, is the classical approach and uses the most extreme value for a specific frame of time, typically one year. POT is based in the selection of a single threshold value to do the analysis only with values above the threshold. There are different POT approaches depending on how the time and magnitude dimensions are analyzed, the most common one uses generalized Pareto for wind velocities and Poisson process for time (POT-Poisson-GPD), and the most flexible one uses Poisson process simultaneously for both dimensions (POT-PP).
In both methods (Epochal and POT), the main step is to fit wind velocities to an appropriate probability distribution model. Epochal uses a generalized extreme value distribution GEV, a family of distributions which include extreme value type I - Gumbel, extreme value type II - Fréchet, and Weibull. Gumbel is the most common used GEV. POT uses a generalized Pareto distribution (POT-Poisson-GPD), or an intensity function (POT-PP).
Distribution models are fitted based in the estimation of its parameters, commonly called location, scale, and shape, nonetheless each model has its own parameters names. There are different methods to estimate parameters, among them: (a) method of modified moments (???), and L moments (???), (b) method of maximum likelihood MLE (???), (c) probability plot correlation coefficient, and (d) elemental percentiles (for GPD and GEV).
Once candidate parameters are available, it is necessary to assess the goodness of fit of the selected model using tests like Kolmogorov-Smirnov (KS), or Anderson-Darling. Here a visual assessment is also useful using a probability plot or a kernel density plot with the fitted PDF overlaid.
The main use of the fitted model is the estimation of mean return intervals MRI, and extreme wind speeds (return levels),
\[\begin{equation} \mathrm{ MRI=\frac{1}{1-F(y)} } (\#eq:mri) \end{equation}\]
with \(F(y)\) as the CDF. If \(1-F(y)\) is the annual exceedance probability, MRI is its inverse; see (???) for more details about MRI. If \(y\) is solved from previous equation using a given MRI of N-years, its value represents the \(Y_N\) wind speed return level. Refer to each specific method bellow for specific solutions to \(Y_N\).
The CRAN Task View “Extreme Value Analysis” https://cran.r-project.org/web/views/ExtremeValue.html shows available R for block maxima, POT by GPD, and external indexes estimation approaches. Most important to consider are evd, extremes, evir, POT, extremeStat, ismev, and Renext.
To work with random variables of sample maximum values, used probability distribution function PDF is GEV:
\[\begin{equation} \mathrm{ H(y) = \exp\left\{-\left[1+\xi\frac{x-\mu}{\psi}\right]_+^{-\frac{1}{\xi}}\right\} } (\#eq:epochalpdf) \end{equation}\]
Where \(\xi\neq0\), \([...]_+ = max([...],0)\), \(\mu\) is the location parameter, \(\psi > 0\) is a scale parameter, and \(\xi\) is a shape parameter. GEV combines in one unique family the Gumbel (medium-tailed) distribution (limit \(\xi\rightarrow0\)), Fréchet (long-tailed) distribution (\(\xi>0\)), and Weibull (short-tailed) distribution (\(\xi<0\)).
In this model, (a) the magnitude of the observations above the threshold are assumed to be independent random variables with the same generalized Pareto as probability distribution, \(\sigma\) as scale, and \(\xi\) as tail length, and (b) corresponding times are assumed to follow a one dimensional homogeneous Poisson process with \(\gamma\) as parameter.
With the condition of exceeding some high threshold \(b\), as a consequence \(y = (x-b) > 0\), the CDF of \(y\) is the generalized Pareto distribution GPD:
\[\begin{equation} \mathrm{ F(y) = 1 - \left[1-\xi\frac{y}{\sigma}\right]^{-\frac{1}{\xi}}_+, } (\#eq:gpd) \end{equation}\]
where \([...]_+ = max([...],0)\), \(b\) is the threshold. In both GPD (magnitude), and \(1D\) Poisson process (time), it is not possible to differentiate between thunderstorm and non-thunderstorm wind types. (???) found a rigorous connection between epochal (GEV), and limits results of POT-Poisson-GPD, as parameter shape \(\xi\) in both previous equations are exactly the same. The long-tailed case when \(\xi>0\), GPD behaves as usual Pareto distribution, for \(\xi = 0\) (taking the limit \(\xi\rightarrow0\)) it behaves as exponential distribution, and \(\xi<0\) the distribution has a finite upper endpoint at \(-\frac{\sigma}{\xi}\).
In POT-Poisson-GPD, to calculate return levels (RL), \(Y_N\), corresponding to the N-years return period, next equation is used,
\[\begin{equation} \mathrm{ Y_N =G\left(y, 1-\frac{1}{\lambda\,N}\right) } (\#eq:yngpd) \end{equation}\]
where \(G\) is the quantile function (PPF), and the value of the probability passed to the \(G\) function, has to be modified with the \(\lambda\) parameter. \(\lambda\) is the number of wind speed events over the threshold per year.
According to (???) the stochastic Poisson Process PP is mainly defined by its intensity function. As the intensity function is not uniform over the domain, the PP considered here is non-homogeneous, and due to the intensity function dependency of magnitude and time, it is also bi-dimensional. PP was described for the first time in (???), then extended in (???).
Generic Equation @ref(eq:ppgenericintensityfunction) shows the intensity function, which is defined in the domain \(D = D_t\,{\cup}\,D_{nt}\), and allow to fit the PP at each station to the observed data \(\{t_i, y_i\}_{i=1}^I\), for all the times (\(t_i\)) of threshold crossing observations, and its corresponding wind speeds magnitudes (\(y_i\)). Thus, only data above the threshold (POT) are used.
\[\begin{equation} \mathrm{ \lambda\left(y,t\right) \begin{cases} \begin{split} &\lambda_t(y),\;\textrm{for t in thunderstorm period} \\ &\lambda_{nt}(y),\;\textrm{for t in non-thunderstorm period} \end{split} \end{cases} } (\#eq:ppgenericintensityfunction) \end{equation}\]
The specific intensity function of the PP is defined in (???) and is shown in Equation @ref(eq:ppintensityfunction):
\[\begin{equation} \mathrm{ \frac{1}{\psi_t}\left(1+\zeta_t\frac{y-\omega_t}{\psi_t}\right)_+^{-\frac{1}{\zeta_t}-1} } (\#eq:ppintensityfunction) \end{equation}\]
where, at a given time \(t\), parameter \(shape = \zeta_t\) controls the tail length of the intensity function, and the other two parameters \(\omega_t\) and \(\psi_t\) define the location and scale of the intensity function.
Figure @ref(fig:plotdomainpp) represent the domain \(D\) of PP. In time, the domain represents the station service period from first sample \(t_1\) to last sample \(t_4\). \(D\) is the union of all thunderstorm periods \(D_t\) (from \(t_2\) to \(t_3\)), and all non-thunderstorm periods \(D_{nt}\) (periods \(t_1\) to \(t_2\) and \(t_3\) to \(t_4\)). In magnitude, only thunderstorm data above its threshold \(b_t\), and only non-thunderstorm data above its threshold \(b_{nt}\) are used.
Thunderstorms and non-thunderstorms are modeled independently:
Domain off Poisson Process - PP
Visual representation of the intensity function for PP can be seen in Figure @ref(fig:plotdomain3dpp). In vertical axis, two surfaces were drawn representing independent intensity functions for thunderstorm \(\lambda_t(y)\) and for non-thunderstorm \(\lambda_{nt}(y)\). The volume under each surface for its corresponding time periods and peak over threshold velocities, is the mean of PP.
Volume Under Surfaces: Mean of PP
To fit the intensity function to the data, the method of maximum likelihood is used to estimate its parameters, \(scale = \psi\), \(location = \omega\), and \(shape = \zeta\), the selected vector of parameters \(\eta\) are the \(\hat\eta = (\hat\psi, \hat\omega, \hat\zeta)\) values that maximizes next function
\[\begin{equation} \mathrm{ L(\eta)=\left( \prod_{i=1}^I\lambda\left(y_i,t_i\right) \right) \exp\left\{ -\int_{{D}}\lambda\left(y,t\right)dy\,dt \right\} } (\#eq:pplikelihood) \end{equation}\]
The values of \(\hat\eta\) need to be calculated using a numerical approach, because there is not analytical solution available.
Once the PP is fitted to the data, the model will provide extreme wind velocities (return levels), for different return periods (mean recurrence intervals).
A \(Y_N\) extreme wind velocity, called the return level (RL) belonging to the N-years return period, has an expected frequency to occur or to be exceeded (annual exceedance probability) \(P_e = \frac{1}{N}\), and also has a probability that the event does not occur (annual non-exceedance probability) \(P_{ne}=1-\frac{1}{N}\). \(Y_N\) will be the resulting value of the \(G\) (PPF or quantile) function using a probability equal to \(P_{ne}\). \(Y_N=quantile(y, p=P_{ne})=G(y,p=P_{ne})=PPF(y,p=P_{ne})\). \(Y_N\) can be understood as the wind extreme value expected to be exceeded on average once every N years.
For PP, \(Y_N\) is the solution to the next equation, which is defined in terms of the intensity function,
\[\begin{equation} \mathrm{ \int_{Y_N}^{\infty}\int_0^1\lambda\left( y,t\right)dydt = A_t\int_{Y_N}^{\infty}\lambda_t\left( y\right)\,dy + A_{nt}\int_{Y_N}^{\infty}\lambda_{nt}\left( y\right)\,dy = \frac{1}{N} } (\#eq:pprl) \end{equation}\]
where \(A_t\), is the multiplication of the average number of thunderstorm per year and the average length of a thunderstorm, taken to be 1 hour as defined in (???), and \(A_{nt} = 365 - A_t\). The average length of a non-thunderstorm event is variable, and it is adjusted for each station to guarantee that \(A_{nt} + A_t = 365\). Value 365 is used only, if operations with time in the dataset are performed in days.
The same thunderstorm event in considered to occur if the time lag distance between successive thunderstorm samples is small than six hours, and for non-thunderstorm this time is 4 days. For PP, all the measurements belonging to the same event (thunderstorm or non-thunderstorm), need to be de-clustered to leave only one maximum value. In other words, the number of thunderstorm in the time series is one plus the number of time lag distances greater than 6 hours, and above 4 days for non-thunderstorm.
As the output maps of this research will be used as input loads for infrastructure design, the methodology used for its creation, need to be consistent with Colombian official wind loads requirements. Colombian structure design code, from now the design standard, was created and is maintained by the Colombian Association of Seismic Engineering - AIS.
The design standard is mainly based in minimum design loads and associated criteria for buildings and other structures ASCE7-16 standard (???). The ASCE7-16 standard defines the minimum requirements for design wind loads in pages 733 to 747. Wind speeds requirements of ASCE7-16 are based in the combination of independent non-hurricane analysis, and hurricane wind speeds simulations models. The focus of this research will be the analysis of non-hurricane wind data, however, existing results of hurricane studies will be used to present final maps with both components. In ASCE7-16, for non-hurricane wind speed, the procedure is mainly based on (???).
ASCE7-16 (page 734), requires the calculation of wind extreme return levels for specific return periods according to the risk category of the structure to be designed, as follows: risk category I - 300 years, risk category II - 700 years, risk category III - 1700 years, risk category IV - 3000 years. In addition, extreme wind speeds for those MRI need to correspond to 3 seconds gust speeds at 33 ft (10 meters) above the ground and exposure category C. Below is a description of the risk categories:
To standardize wind speeds to gust speeds ASCE7-16 proposes the curve Durst, see (???), and Figure @ref(fig:durstcurve). Durst curve is only valid for open terrain conditions, and it shows in axis \(y\) the gust factor \(\frac{V_t}{V_{3600}}\), a ratio between any wind gust (maximum speeds) averaged at \(t\) seconds, \(V_t\), and the hourly averaged wind speed \(V_{3600}\), and in the axis \(x\) the duration \(t\) of the gust in seconds. Be aware that curve values in Figure @ref(fig:durstcurve) are approximated values taken visually from the original curve.
Durst Curve
This research is focus in non-hurricane data, with three main elements: data, temporal analysis with POT-PP, and spatial analysis to do spatial interpolation and create return levels RL maps for MRIs of 700, 1700, and 3000 years. Core steps (1, and 3 to 7) need to be done in an iterative process station by station as is shown in Figure @ref(fig:mainmethodology).
Iterative Process in Methodology
Figure @ref(fig:methodology) shows the methodological scheme where the main elements mentioned are highlighted using shaded boxes. Steps 1 to 8 are the most representative, but step 2 is a data verification process that can be done once (in bulk) for all stations.
Methodology
Once the iterative cycle ends and the RL are calculated in all the stations, continuous surfaces will be created, one for 700 years, next for 1700 years, and finally for 3000 years. An additional element, is the integration with existing hurricane studies to produce final maps, that will be used as input loads for infrastructure design, and will be part of the design standard
Analysis of extreme wind speeds requires data standardization as initial step. All input data must be standardized to represent three important conditions: a) anemometer height of 10 meters, b) open space terrain roughness (exposition C), and c) averaging time of 3-seconds wind gust. (???) defines exposition C as areas with few obstructions, and exposition D refers to perfect open space.
Parallel to the standardization activity described below, it is also important to consider for all stations involved in the analysis:
Separating: As far as possible, identify each record of the time series, as thunderstorm (t) or non-thunderstorm (nt)
Filtering: Remove wind speeds above \(200 \frac{km}{h}\) and data pertaining to hurricane events, because the procedure with hurricane requires a different approach and need to be done independently
According to the protocol for field data collection and location of methodological stations (???), the anemometer (wind sensor) in installed always to a fixed height of 10 meters from the surface, as is shown in Figure @ref(fig:anemometer); therefore, no height correction is needed.
Anemometer height: 10 meters
Due to the effects that the terrain has on wind speed, a correction should be applied if the station is located in a geographical space considered “not open terrain”. When terrain is open, the roughness corresponds to 0.03 meters. There are some alternative methodologies to calculate the roughness; for example, (???) uses the station data, but the separation of the measurements should not exceed one minute (something difficult to obtain), (???) uses the empirical Equation @ref(eq:zo) (recommended in ASCE7-16 page 743, equation C26.7-1) to calculate roughness \(z_o\), which was used here,
\[\begin{equation} \mathrm{ z_0= 0.5\;H_{ob}\;\frac{S_{ob}}{A_{ob}} } (\#eq:zo) \end{equation}\]
where \(H_{ob}\) is the average height of the obstacles, \(S_{ob}\) is the average vertical area perpendicular to the wind of the obstacles, and \(A_{ob}\) is the average area of the terrain occupied by each obstruction. The empirical exponent \(\alpha\), gradient height \(z_g\), and exposure coefficient \(K_z\), are used to calculate the correction factor \(F_{exposition}\), for \(z_0\) units are in meters.
\[\begin{equation} \mathrm{ \alpha = 5.65*z_0^{-0.133} } (\#eq:alpha) \end{equation}\]
\[\begin{equation} \mathrm{ z_g=450*z_0^{0.125} } (\#eq:zg) \end{equation}\]
\[\begin{equation} \mathrm{ K_z= 2.01*\left(\frac{z}{z_g}\right) } (\#eq:kz) \end{equation}\]
\[\begin{equation} \mathrm{ F_{exposition} = \frac{0.951434}{K_z} } (\#eq:fexpo) \end{equation}\]
According to (???), calculation of roughness needs to be weighted according to the predominance of wind magnitude in eight directions (north, south, east, west, north-east, north-west, south-east, and south-west) around the station location. The calculation in each direction can be done using a detailed aerial photo or satellite image of the station, including a radius of 800 meters. Figure @ref(fig:compassrose) shows the wind percentages in mentioned directions for a generic station. Figure @ref(fig:lettaustation2) shows the satellite image for Vanguardia ISD station (USAF:802340), located in Villavicencio airport, with four (south, north, east, and west) 45 degree sectors highlighted
Wind Rose with Wind Percentages
Digital Imagery for ‘Vanguardia’ ISD Station (USAF:802340)
Figure @ref(fig:lettauexamples2) shows extreme conditions for roughness, open space in left image (ISD Station 804070) with roughness value of 0.03, closed space in center image (ISD Station 803000) with roughness value of 0.1, and a typical example where mentioned Lettau procedure is needed because roughness is different in each direction, in right image. Lettau Equation @ref(eq:zo) need to be applied to each direction and then the final \(z_o\) value is the weighted average, using historical wind percentage. See Figure @ref(fig:lettauvalues) showing the strokes made to calculate the different areas for two Colombian stations, in red the area occupied by the obstacles, and in blue the perpendicular area (???). Information about wind percentage per direction at each station were obtained from (???).
Roughness. Open (L), Closed (C), and Lettau (R).
Lettau Calculation
To transform hourly mean wind velocity \(V_{3600}\), to 3-s gust velocity \(V_3\), ASCE7-16 recommends the use of (???). In curve Durst the axis \(x\) represents the duration \(t\) of the gust, what is done is to look there for the value 3 seconds, and read the corresponding gust factor \(\frac{V_t}{V_{3600}}\) in axis \(Y\). For instance, using variable \(V_{3600}\) from IDEAM data source, the gust factor for 3-s gust is 1.51.
\[\begin{equation} \mathrm{ V_t = V_{3\,\textrm{seconds}} = \textrm{(gust factor)} V_{3600} } (\#eq:vt) \end{equation}\]
It is valid only for open terrain conditions. Durst curve shows in axis \(y\) the gust factor \(\frac{V_t}{V_{3600}}\), a ratio between any wind gust averaged at \(t\) seconds, \(V_t\), and the hourly averaged wind speed \(V_{3600}\), and in the axis \(x\) the duration \(t\) of the gust in seconds.
Where it is necessary to complement the local/regional wind analysis, with data from ISD (output data of a model for extreme winds), and ERA5 reanalysis dataset (large scale forecast data), it is required to probe by means of comparisons (exploratory data analysis and/or statistical measures) that those sources (modeled and forecast) are similar to IDEAM field measurements.
The proposed mechanism in the search for downscaling support is, (a) the creation of common time series graphs, where time series from all data sources are expected to be similar, and (b) the elaboration of scatter plots graphics, which are generated matching two sources in time (sorted in ascending order by wind velocity). By visual inspection is possible to evaluate data similarity between sources, when all the points fall very close to a 45-degree line. In both cases, the strategy for station matching, could be one of the following:
Manual matching, doing a detailed analysis station by station (only for ISD and IDEAM). While it is true that ISD is based on IDEAM, their names and locations are somewhat different, for this reason, it is necessary to read information available from each source, and decide station by station, about its correspondence.
Intersection matching, between ISD and IDEAM point stations and ERA5 cells. All ISD and IDEAM stations falling inside a ERA5 cell, will be compared between them.
Similar to how the adjustment of statistical data to a normal distribution is done to make inferences, in extreme value analysis only some part of the data (those that are extreme - over a high threshold - POT) needs to be fitted to a PP considering extreme deviations from the mean. While in the first case (normal distribution) the inferences are for events similar to the samples, in this case, when working with extreme value theory, the inferences will be for more extreme events than any previously observed or measured.
In summary, POT means only to work with extreme values, and PP means to adjust data to a PDF, which depends on an intensity function \(\lambda(t,y)\), where \(t\) is time, \(y\) is wind extreme velocity. As shown in Figure @ref(fig:plotdomainpp), in a POT-PP approach with domain \(D\), all the observations follow a Poisson distribution with mean \(\int_D\lambda(t,y)\,dt\,dy\). Main advantage of POT-PP is that it is designed to consider storm and not-storm events independently (for each disjoint sub-domain \(D_1\) or \(D_2\) inside \(D\), the observations in \(D_1\) or \(D_2\) are independent random variables), but in the end use them both for the inferences,
\[\begin{equation} \mathrm{ PDF = f(t,y|\eta) = \frac{\lambda(t,y)}{\int_D\lambda(t,y)\,dt\,dy} } (\#eq:pppdf) \end{equation}\]
To make the assumptions of PP more justifiable, it is important to have only one sample per event: the highest one. For instance, if a hypothetical storm started at 11:30 in the morning and ended at 12:30 in the afternoon, and the time series for that event has thirty wind measurements (one each two minutes), it is necessary to leave only the stronger or maximum value; this process is called de-clustering. In Figure @ref(fig:declustering), two thunderstorm clusters are shown, and only red samples are used to fit the PP. POT-PP defines that all the adjacent observations separated by six hours (6) or less in the case of thunderstorm events, and four (4) days or less, in the case of non-thunderstorm events, belong to the same cluster.
De-clustering in PP
As the POT model requires to work only with the most extreme values in the time series, it is necessary to select a threshold to filter out small values. Bias is high when a low threshold is selected (many exceedances) because the asymptotic support is weak; opposite situation happens for high thresholds where variance is potentially high. According to (???), it is necessary to select a threshold value, consistent with model structure.
POT - Thresholding
The procedure to choose the best thresholds pairs, one for thunderstorm, and other for non-thunderstorm, is based in the W transformation. POT-PP needs selection of the best threshold pairs \(b_t\) and \(b_{nt}\) (see Figure @ref(fig:plotdomainpp)) that produces the optimal fit. Measurement of this threshold fitting is done through \(W\) statistic. If wind variable \(y\), in a POT-PP approach, has a \(CDF = U = F(y)\), then \(F(y)\) is distributed as uniform between 0 and 1 uniform(0,1), meaning that the transformation \(W = -log(1-U)\) is an exponential random variable with mean one (1).
\[\begin{equation} \mathrm{ CDF = U= F(y) = P(y \leq Y) = \frac{\int_b^Y\lambda(y,t)\,dy}{\int_b^\infty\lambda(y,t)\,dy} } (\#eq:ppcdf) \end{equation}\]
W-statistic is done comparing the ordered result of applying \(W = -log(1-U)\) to the data (the axis \(y\) in Figure @ref(fig:wstatistics)) with the theoretical quantiles of an exponential variable with uniform distribution between 0 and 1 (axis \(x\) in same figure). W-statistic is the highest vertical distance between the 45º line and the points in the graphic. The best thresholds pairs return the minimum value for W-statistics after testing, in an iterative process, with many threshold pairs combinations.
POT - Thresholding W Statistic
PP requires to remove long periods of time when stations were not recording or failing. Proposed time in (???) is 180 days, namely, to remove all the gaps from the time series larger than six months.
Probability density function PDF, and cumulative distribution function CDF, of the PP, depend of the intensity function, and are shown in Equation @ref(eq:pppdf), and Equation @ref(eq:ppcdf), respectively.
To facilitate the estimation of the parameters for the PP intensity function, parameter \(shape = \zeta_t\) is taken to be zero in Equation @ref(eq:ppintensityfunction), then doing the limit, the resulting intensity function is the same as the GEV type I or Gumbel distribution,
\[\begin{equation} \mathrm{ \frac{1}{\psi_t}\exp\left\{\frac{-(y-\omega_t)}{\psi_t}\right\} } (\#eq:ppusedif) \end{equation}\]
In this study, used intensity functions are:
\[\begin{equation} \mathrm{ \lambda\left(y,t\right)= \begin{cases} \begin{split} &\frac{1}{\psi_s}\exp\left(\frac{-(y-\omega_s)}{\psi_s}\right),\;\textrm{for t in thunderstorm period} \\ &\frac{1}{\psi_{nt}}\exp\left(\frac{-(y-\omega_{nt})}{\psi_{nt}}\right),\;\textrm{for t in non-thunderstorm period} \end{split} \end{cases} } (\#eq:ppspecificintensityfunction) \end{equation}\]
As is shown in @ref(fig:fitif), the fitting process involve finding the best group of parameters of the intensity function, in such a way that the red curve (PDF of the PP, based in intensity function) be as tight as possible to the shape of the data histogram. As is described in POT-PP, optimal parameters to do the fitting process of the intensity function are calculated using maximum likelihood.
POT - PP Intensity Function Fitting Process
A hazard curve is shown in Figure @ref(fig:hc), where axis \(x\) represents annual exceedance probability \(P_e = \frac{1}{N}\), and axis \(y\) represents the return level RL \(Y_N\) for the corresponding N-years return period. It is possible to obtain the extreme return wind velocity level for any given return period going from axis \(x\) to axis \(y\) through the curve.
POT - PP Hazard Curve
In this research POT-PP includes only time series classified as non-thunderstorm, and this implies that the intensity function to be used (Equation @ref(eq:ppspecificintensityfunction)) does not differentiate between wind types (thunderstorm and non-thunderstorm), i.e., the intensity function is not a function of time t.
Hazard curve can be created solving \(Y_N\) in Equation @ref(eq:pprl) for a specific value of N in years (MRI). As a bad estimation caused by the deficiencies in the available information for the case study, the average duration time of non-thunderstorm events by year is considered to be one year, i.e., the parameter \(A_{nt}\) in Equation @ref(eq:pprl) is equal to one, and \(A_t\) is equal to zero (units in years).
Considering that the intensity function is not a function of time, equation @ref(eq:pprv) can be used replacing directly the parameters of the intensity function (PP) and the return periods (N), to create the hazard curve and get RL:
\[\begin{equation} \mathrm{ Y_N=\frac{\psi}{\zeta}\left[-log\left(\frac{N-1}{N}\right)\right]^{-\zeta}-\frac{\psi}{\zeta}+\omega } (\#eq:pprv) \end{equation}\]
As for this research \(\zeta = 0\) in selected intensity function (Equation @ref(eq:ppspecificintensityfunction)), return levels \(Y_N\) can be calculated with the Gumbel quantile function using \((1-\frac{1}{N})\) as probability. This alternative approach is only valid when the analysis of POT-PP includes only one type of event (thunderstorm or non-thunderstorm). The connection between the intensity function of PP and the Gumbel function (variant of the GEV) is described in (???)
Probabilistic (Kriging) and deterministic (IDW, local polynomials) techniques are used to create maps for return levels with same return period. Interpolation with Kriging requires verification of minimum technical requirements to ensure proper use of the method, particularly:
Structural analysis, which includes data normality check, for example with Kolmogorov Smirnov or Shapiro Wilk goodness of fit tests, and if needed, data transformation to ensure data normality, e.g. using Box-Cox, and in addition, trend analysis to verify the need for trend modeling in subsequent steps.
Semivariance Analysis: Use of available tools like cloud semivariogram, experimental semivariogram, directional semivariograms to verify isotropy or anisotropy, and different theoretical semivariograms, to ensure the best model for spatial autocorrelation, as a preliminary step to interpolation.
Kriging Predictions: Use of different types of Kriging predictors, like simple, ordinary, universal, based on the results of the structural analysis.
Cross Validation: Use of statistics like root mean square, average standard error, mean standardized, and root mean square standardized, that allow to measure the quality of the predictions and the magnitude of the errors.
Possible advantage of deterministic methods, is a better assessment of the local variability of spatial autocorrelation. It can also be considered with IDW or local polynomials a detailed assessment of structural analysis and cross validation. At the end of the spatial interpolation analysis all the predictions can be compared to select the most suitable result.
Main references in this research related to this matter, using R software are (???), (???), and (???). For the implementation of spatial statistics using vector or raster format, see (???), (???), and (???).
ASCE7-16 proposes the equation C26.5-2 for combination of statistically independent events, of non-hurricane and hurricane wind speed data.
\[\begin{equation} \mathrm{ P_e(y>Y_N) = 1 - P_{NH}(y<Y_N)P_{H}(y<Y_N) } (\#eq:combination) \end{equation}\]
where \(P_e(y>Y_N)\) is the annual exceedance probability for the combined wind hazards, \(P_{NH}(y<Y_N)\) is the annual non-exceedance probability for non-hurricane winds, and \(P_{H}(y<Y_N)\) is the annual non-exceedance probability for hurricane winds.
To understand Equation @ref(eq:combination), it is important to remember that to calculate return level \(Y_N\), for a given N-year return period, the exceedance probability \(\frac{1}{N}\) of \(Y_N\) is calculated. Then, the non-exceedance probability for \(Y_N\) is \(\left(1-\frac{1}{N}\right)\). The procedure consists in the creation of a new hazard curve, calculating all \(P_e(y>Y_N)\) values for different \(Y_N\) return levels, combining hazard curves from non-thunderstorm and thunderstorm data.
Equation @ref(eq:combination) can be expressed only in terms of exceedance probabilities, \(P_{e} = 1 - (1 -P_{nh}) (1 - P_{h})\), where \(P_{nh}\) is the the annual exceedance probability for non-hurricane winds, and \(P_{h}\) is the annual exceedance probability for hurricane winds. A graphical explanation of the procedure to calculate the combined \(P_e\) for the return level \(30\frac{Km}{h}\), is shown in next Figure @ref(fig:combinedhc). For each cell in the study area, it is necessary to calculate a new combined hazard curve, this is, all the \(P_e\) values corresponding all different return levels (Figure @ref(fig:combinedhc)).
Integration of Hurricane and Non-Hurricane Data
The procedure followed in this research to generate the final maps, requires that hurricane and non-hurricane wind maps have already been generated in raster format for main return periods (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, and 7000 years). As many maps as possible are required for different return periods to estimate detailed enough hazard curves from return values (cell values).
The main elements of the implemented algorithm are: (a) select the cell size for the integrated map as the maximum cell size of the input maps (hurricane and non-hurricane), (b) create an empty final raster with cell size from previous step, (c) recreate input maps using zonal statistics, where zone is the empty raster, to leave only the maximum value of the input cells that fall within each cell in zone, (c) for each available tile in integrated map, recreate non-hurricane and hurricane hazard curves (using result from previous step), (d) use Equation @ref(eq:combination) to calculate the integrated return level for each tile of the final map, and (e) create a final raster for each main return period.
An alternative approach for non-hurricane and hurricane data integration is apply Equation @ref(eq:combination) with non-hurricane hazard curves obtained at each station (before spatial interpolation) in combination with hurricane hazard curves from existing studies (obtained at same location of non-hurricane stations), and finally apply a spatial interpolation process. Main disadvantage of this procedure is that no non-hurricanes wind maps will be generated. Main advantage is that the computation time and the complexity of the integration algorithm are comparatively low, since the integration is only done in the location of the stations and not in all cells.
For the selection of the method to apply in this section, it is recommended to analyze among other aspects: (a) differences in the spatial resolution (e.g. cell size) of both types of studies need to be evaluated, because it can impact the quality of the final product, (b) prediction errors resulting from spatial interpolation can be amplified at specific cells locations, (c) creation of the algorithm, i.e. use of st_apply function of stars package (???) (apply functions to raster dimensions) to avoid cell-by-cell calculations, and (d) computer processing time.
This section has four main elements. First, the data source comparison (post standardization process) to face the downscaling issue, then, the process of fitting POT-PP to the ISD station 801120, next, non-hurricane maps (ISD and ERA5) and final maps will be displayed, and finally, a detailed discussion of the results and future work is highlighted.
Looking for a statistical justification in the use of ISD (model) and ERA5 databases (forecast), as input data for this study, and considering the downscaling approach presented previously, data sources ISD and IDEAM were standardized to enable comparison. Standardization consisted of transforming the data to be equivalent to \(V_3\) (3-s gust), 10 meters of anemometer height, and open space roughness. In the comparison process, it was checked if the velocity values (standardized) in the three sources, for equal stations and dates, were similar in magnitude.
None of the sources required anemometer height standardization. (???) was used for roughness standardization of ISD and IDEAM, applying the method station by station. Gust velocities standardization was done using Durst curve. To obtain \(V_3\) from Durst curve, it was required to start from \(V_{3600}\) (average hourly speed), or from a different wind gust speed, for instance \(V_5\) (5-s gust). Some elements that are relevant to each data source are described below.
ERA5:
ISD:
Wind velocity from ISD comes from source as \(V_5\), that is, five seconds gust wind velocity. To standardize from \(V_5\) to \(V_3\), using Durst curve, the correction factor is 1.03.
Wind velocity \(V_5\) from 74 ISD stations, was standardized station by station, using procedure described in sections Surface Roughness at Open Terrain, and Averaging Time 3-s Gust.
IDEAM:
It was not possible to obtain the average hourly speed \(V_{3600}\) directly from the institute (IDEAM), see Table @ref(tab:tabledatasources2), but it was calculated from received variables: a good estimator from instantaneous wind velocity each 2 minutes VV_AUT_2, and a poor estimator from instantaneous wind velocity each 10 minutes VV_AUT_10.
The Durst curve with \(V_{3600}\) was used to calculate gust speeds. To standardize from \(V_{3600}\) to \(V_3\), the correction factor is 1.51.
The available IDEAM data allowed two comparison processes, with quality data for few stations, and with low quality data, but available for all stations. In both cases, to make the use of ISD and ERA5 viable, its time series are expected to be as similar as possible to IDEAM (field measurements). As was described in methodology section, to verify similarity two types of graphics were constructed: time series overlay, and scatter plot graphics.
The IDEAM variable instantaneous wind velocity each 2 minutes (VV_AUT_2) was available for twenty (20) stations, of which only twelve (12) were perfectly equivalent to ISD stations (Table @ref(tab:table12stations) and map in left panel of Figure @ref(fig:qualitycomparison2)). VV_AUT_2 dataset was transformed to \(V_{3600}\) (average hourly speed) averaging all 20 values available per hour. For twelve matching stations, wind velocity \(V_{3600}\) was standardized station by station, using procedure described in Surface Roughness at Open Space section and Averaging Time 3-s Gust. Finally, for the same twelve ISD and IDEAM standardized stations, a comparison was done against matching ERA5 stations (the corresponding cell in ERA5 that has within ISD and IDEAM locations).
The stations described in each row of the Table @ref(tab:table12stations) were compared by generating scatter plots and common time series graphics. For only the stations corresponding to one row there was high correspondence between sources. Unfortunately, in the other eleven stations downscaling support was not reflected.
The stations corresponding to the map in the right panel of Figure @ref(fig:qualitycomparison2) and scatter plot in Figure @ref(fig:sideamera5), show high correspondence between sources because green regression line (empirical) is very similar to \(45^\circ\) line (theoretical). Axis \(x\) in m/s correspond to IDEAM stations 28025502, and axis \(y\) with same units contains ERA5 416 station (cell with center point in \(-73.25^\circ\) longitude, and \(10.5^\circ\) latitude). The points in the upper part of the graph that move away from the global trend of the correlation, correspond to erroneous field measurements (sensor failure) of the IDEAM meteorological station.
| ISD ID | IDEAM ID | ERA5 ID, (col,row), [lon,lat] |
|---|---|---|
| 803980 | 48015050 | 3320, (37, 68), [-70, -4.25] |
| 803700 | 52055230 | 2309, (6, 48), [-77.75, 0.75] |
| 802110 | 26125061 | 1582, (14, 33), [-75.75, 4.5] |
| 802100 | 26125710 | 1533, (14, 32), [-75.75, 4.75 |
| 801120 | 23085270 | 1240, (15, 26), [-75.5, 6.25] |
| 801100 | 27015330 | 1240, (15, 26), [-75.5, 6.25] |
| 800970 | 16015501 | 909, (27, 19), [-72.5, 8] |
| 800940 | 23195502 | 1102, (24, 23), [-73.25, 7] |
| 800630 | 13035501 | 749, (14, 16), [-75.75, 8.75] |
| 800360 | 28025502 | 416, (24, 9), [-73.25, 10.5] |
| 800350 | 15065180 | 221, (25, 5), [-73, 11.5] |
| 800280 | 29045190 | 312, (18, 7), [-74.75, 11] |
IDEAM VV_AUT_2 - Quality Data Comparison
Quality Data Comparison. High Similarity Between Sources
Quality data comparison of instantaneous wind velocity each 2 minutes (VV_AUT_2)
IDEAM variable instantaneous wind velocity each 10 minutes (VV_AUT_10) was available for 204 stations, despite \(V_{3600}\) calculated from this source is not an accurate or quality estimator, the comparison results are shown in Figure @ref(fig:poorcomparison). Downscaling support was ‘Good’ comparing IDEAM and/or ISD stations with twenty-three (23) ERA5 stations (2261, 1971, 2066, 2020, 2260, 1875, 2213, 2637, 1442, 1583, 1501, 1582, 1381, 1493, 1485, 1397, 1338, 1055, 511, 1644, 515, 221, 1038), and ‘Very Good’ comparing IDEAM and/or ISD with five (5) ERA5 stations (265, 360, 78, 312, 416).
Table @ref(tab:tableverygood) shows in each row compared stations with ‘Very Good’ downscaling results. Figure @ref(fig:verygoodxts) shows an example of a very good match in the time series plot for the ERA5 station 78 vs IDEAM stations 15075501 and 15079010. Figure @ref(fig:verygood) shows four different very good matching scatter plots: (a) IDEAM 15015120 vs ERA5 265, (b) IDEAM 29004520 vs ERA5 312, (c) IDEAM 15079010 vs ERA5 78, and (d) IDEAM 15075501 vs ERA5 78. Red line in each graphic represent the desired \(45^\circ\) line, where all points should fall, if the data sources would be exactly the same (theoretical behavior when there is equivalence of sources), and green line represents the linear regression line (empirical or real behavior when making the comparison).
IDEAM VV_AUT_10. Non-Quality Data Comparison
| ISD ID | IDEAM ID | ERA5: ID, (col,row), [lon,lat] |
|---|---|---|
| NA | 16015501 | 78, (29, 2), [-72, 12.25] |
| NA | 15079010 | 78, (29, 2), [-72, 12.25] |
| NA | 15075501 | 78, (29, 2), [-72, 12.25] |
| NA | 15015120 | 265, (20, 6), [-74.25, 11.25] |
| NA | 29004520 | 312, (18, 7), [-74.75, 11] |
| 800280 | 29045190 | 312, (18, 7), [-74.75, 11] |
| NA | 29045000 | 360, (17, 8), [-75, 10.75] |
| NA | 28025502 | 416, (24, 9), [-73.25, 10.5] |
| 800360 | 28035060 | 416, (24, 9), [-73.25, 10.5] |
For some ISD stations in Table @ref(tab:tableverygood), the value ‘NA’ means that for the corresponding ERA5 and IDEAM station (same row), there is not an ISD station located inside the ERA5 cell space (\(0.25^\circ * 0.25^\circ\)). Velocities in the axis \(y\) of Figure @ref(fig:verygoodxts) and all the axis in Figure @ref(fig:verygood) are in m/s.
Time Series Graphic for ‘Very Good’ Downscaling Support
Scatter Plots for ‘Very Good’ Downscaling Support
Figure @ref(fig:station801120) shows the satellite image (source Google Earth) of ISD station 801120, located in the international airport ‘José María Córdova’, municipality of Rio Negro (Antioquia, Colombia), with latitude \(6.125^\circ\), and longitude \(-75.423^\circ\) WGS84 coordinates. Red circle represents an influence radius of 800 meters. Table @ref(tab:cf801120) shows different calculations related to correction factors applied to this station, using procedure described in sections Surface Roughness at Open Terrain, and Averaging Time 3-s Gust.
Location of ISD Station 801120
As storm information is not available for any of the data sources, all the data for the station was classified as non-thunderstorm. According to POT-PP method described in Methodology, the first process applied to original time series raw data, is De-clustering, and then, Thresholding.
Non-thunderstorm raw data for ISD station 801120 has 2931 records, from 1986-12-06 12:00:00 to 2019-03-01 12:00:00, corresponding to a total amount time in days of 11739, and to an average number of events per year of 18.9, which means that the average duration of an event is 19.3 days (average size in days of a cluster). After De-clustering and Thresholding processes, the number of records decreases to 181. Time series graphics are related in Figure @ref(fig:ts), showing the data before (left) and after (right) applying the mentioned processes. Detailed yearly statistics are reported in Table @ref(tab:years801120), also including summary for before (left), and after (right).
| Year | Count | Mean | Min | Max | Count | Mean | Min | Max |
|---|---|---|---|---|---|---|---|---|
| 1986 | 63 | 45.2 | 27.9 | 163.3 | 7 | 106.4 | 43.8 | 163.3 |
| 1987 | 192 | 36.1 | 26.7 | 87.6 | 10 | 61.0 | 45.0 | 87.6 |
| 1988 | 234 | 43.8 | 26.7 | 90.4 | 23 | 64.2 | 45.0 | 90.4 |
| 1989 | 256 | 44.2 | 27.9 | 103.6 | 19 | 64.4 | 45.0 | 103.6 |
| 1990 | 250 | 44.9 | 26.7 | 103.6 | 21 | 67.2 | 45.0 | 103.6 |
| 1991 | 149 | 38.7 | 26.7 | 127.5 | 20 | 58.6 | 45.0 | 127.5 |
| 1992 | 126 | 35.2 | 26.3 | 81.7 | 9 | 52.6 | 43.8 | 81.7 |
| 1993 | 109 | 36.3 | 26.3 | 79.7 | 13 | 53.5 | 43.8 | 79.7 |
| 1994 | 124 | 36.8 | 26.7 | 79.7 | 12 | 56.1 | 45.0 | 79.7 |
| 1995 | 89 | 33.3 | 26.7 | 111.5 | 2 | 77.7 | 43.8 | 111.5 |
| 1996 | 70 | 35.6 | 26.7 | 87.6 | 6 | 65.7 | 43.8 | 87.6 |
| 1997 | 71 | 36.6 | 26.7 | 119.5 | 4 | 86.9 | 49.0 | 119.5 |
| 1998 | 65 | 33.8 | 27.9 | 61.4 | 2 | 54.6 | 47.8 | 61.4 |
| 1999 | 48 | 31.7 | 26.7 | 47.8 | 1 | 47.8 | 47.8 | 47.8 |
| 2000 | 69 | 33.4 | 26.7 | 87.6 | 3 | 68.3 | 55.8 | 87.6 |
| 2001 | 62 | 29.9 | 26.7 | 39.8 | 0 | NA | NA | NA |
| 2002 | 94 | 33.3 | 26.7 | 71.7 | 5 | 54.2 | 43.8 | 71.7 |
| 2003 | 78 | 31.5 | 26.7 | 71.7 | 1 | 71.7 | 71.7 | 71.7 |
| 2004 | 60 | 31.9 | 26.7 | 51.8 | 2 | 48.4 | 45.0 | 51.8 |
| 2005 | 59 | 33.3 | 26.7 | 94.4 | 2 | 69.1 | 43.8 | 94.4 |
| 2006 | 55 | 32.6 | 26.7 | 164.1 | 1 | 164.1 | 164.1 | 164.1 |
| 2007 | 25 | 29.8 | 26.7 | 39.0 | 0 | NA | NA | NA |
| 2008 | 13 | 36.1 | 26.7 | 96.4 | 1 | 96.4 | 96.4 | 96.4 |
| 2009 | 36 | 31.6 | 26.7 | 82.1 | 1 | 82.1 | 82.1 | 82.1 |
| 2010 | 31 | 43.0 | 27.9 | 119.5 | 8 | 83.0 | 61.4 | 119.5 |
| 2011 | 32 | 29.2 | 26.7 | 41.0 | 0 | NA | NA | NA |
| 2012 | 82 | 31.9 | 26.7 | 87.6 | 4 | 64.5 | 43.0 | 87.6 |
| 2013 | 91 | 29.7 | 26.7 | 37.0 | 0 | NA | NA | NA |
| 2014 | 95 | 30.1 | 26.7 | 47.8 | 1 | 47.8 | 47.8 | 47.8 |
| 2015 | 129 | 30.3 | 26.7 | 51.8 | 1 | 51.8 | 51.8 | 51.8 |
| 2016 | 33 | 30.7 | 26.7 | 87.6 | 1 | 87.6 | 87.6 | 87.6 |
| 2017 | 18 | 31.3 | 26.7 | 67.7 | 1 | 67.7 | 67.7 | 67.7 |
| 2018 | 22 | 31.0 | 26.7 | 39.8 | 0 | NA | NA | NA |
| 2019 | 1 | 28.7 | 28.7 | 28.7 | 0 | NA | NA | NA |
It can be seen in the Table @ref(tab:years801120) that de-clustered data has zero records for some years. This situation is due to that all the data for each one of those years (2001, 2007, 2011, 2013, 2018, and 2019), belonged to a cluster that started the previous year or finished the next year, and the unique chosen maximum value (the value representative for the cluster) was found in the previous or next year, but not in mentioned years of zero records.
Non-Storm Time Series. ISD Station 801120. Raw Data(L). De-clustered(R)
Using de-clustered data, it is only necessary to calculate optimal threshold for non-thunderstorm data, because there is no records classified as thunderstorm in any data source. Many non-thunderstorm thresholds were tested, to choose the best one using the W statistic, as described in section Thresholding of the Methodology.
Figure @ref(fig:page6) shows a very good fit in resulting W-Statistic plot, for optimal non-thunderstorm threshold \(b_{nt} =\) 42\(\frac{km}{h}\), with a minimum W distance of 0.47, for ISD station 801120, where empirical values (black points) are very close or similar to theoretical values (red line).
POT - Thresholding. ISD Station 801120
Equation @ref(eq:ppintensityfunction), defined in section POT-PP of the Methodological Framework, was used as intensity function \(\lambda(t, y) = \lambda_{nt}(y)\). When shape \(\zeta_t\) is equal to zero (as it is in this study) an equivalent intensity function is described in Equation @ref(eq:ppspecificintensityfunction) defined in terms of the parameters location (\(\omega_t\)), and scale (\(\psi_t\)). Related PDF and CDF functions are referenced in Equations @ref(eq:pppdf), where the domain \(D\) constraint the data above the threshold b, and the time to a non-thunderstorm period, and @ref(eq:ppcdf) respectively.
Intensity function: \(\frac{1}{\psi_{nt}}\exp\left(\frac{-(y-\omega_{nt})}{\psi_{nt}}\right)\)
PDF: \(f(t,y) = \frac{\lambda(t,y)}{\int_D\lambda(t,y)\,dt\,dy}\)
CDF: \(F(t,y) = P(y \leq Y) = \frac{\int_b^Y\lambda(y,t)\,dy}{\int_b^\infty\lambda(y,t)\,dy}\)
After fitting the intensity function to the domain \(D\), the resulting parameters for ISD station 801120, are location \(\omega_t\) equal to -55.62, and scale \(\psi_t\) equal to 23.4. Figure @ref(fig:page8) shows the histogram and fitted PDF in panel A, Q-Q plot (theoretical quantiles against empirical ones) in panel B, empirical cumulative distribution against fitted CDF in panel C, and P-P plot (theoretical probabilities against empirical ones) in panel D. In all four panels, it can be seen that there is a very good visual correspondence between empirical data (points and histogram) and theoretical adjustment (lines).
Goodness of Fit Graphic Diagnosis. Station 801120
Results of formal goodness of fit statistics for ‘Kolmogorov-Smirnov D’, ‘Cramer-von Mises T’ and ‘Anderson-Darling A’ are 0.089, 0.21, and 1.68 respectively. For a proposed null hypothesis, which indicates that the data conforms to a POT-PP, all resulting p-values using statistics D, T and A, confirm that there is no statistical evidence to reject stated hypothesis. Resulting p-value for statistic D is 0.11. Another available criterion to measure the quality of the fitted process are ‘Akaike’s Information Criterion’, and ‘Bayesian Information Criterion’, with values 1505.2, and 1508.4 respectively. The Root Mean Square Error (RMSE), calculated using theoretical versus empirical CDF, is 0.023.
Hazard curve is the solution to Equation @ref(eq:pprl), but eliminating from it elements related to thunderstorms the equation is simplified to \(A_{nt}\int_{Y_N}^{\infty}\lambda_{nt}\left( y\right)\,dy = \frac{1}{N}\), where \(A_{nt}\) is the average time of non-thunderstorm events by year, and \(Y_N\) is the return level or extreme wind velocity, corresponding to the N-years return period or MRI. Replacing in this equation the intensity function \(lambda_{nt}\), and solving \(Y_N\) for all possible values of MRI, will provide hazard curve displayed in Figure @ref(fig:page10).
Hazard Curve. Station 801120
| MRI (years) | Return Level (km/h) |
|---|---|
| 10 | 136.30 |
| 20 | 152.48 |
| 50 | 174.10 |
| 100 | 190.32 |
| 250 | 211.76 |
| 500 | 227.98 |
| 700 | 235.85 |
| 1000 | 244.20 |
| 1700 | 256.61 |
| 3000 | 269.90 |
| 7000 | 289.73 |
Return levels of interest for this research, correspond to 700, 1700 and 3000 years of MRI, however, due to the mechanism of Integration with Existing Hurricane Studies, described in Methodology, it is necessary to extract for all stations values related to typical return periods, as shown in the Table @ref(tab:rl).
To enable a comparison between (a) POT-PP (previous section), (b) POT-Poisson-GPD, and (c) the fitting process of common extreme value distributions (GPA, GEV, GUM) without using POT method, i.e., using the generic concept of Hazard Function HF (see Theoretical Framework), a whole automation process was done to calculate return levels and errors using mentioned alternatives, bearing in mind that in all cases maximum likelihood was used to calculate the parameters.
To this day (Feb 2020), there is no R package available that implements POT-PP, in contrast, there are many packages that implement POT-Poisson-GPD, this reflects that globally, extreme value analyzes are mainly done with the latter. The following SIX R packages were used: (a) extRemes (???), (b) ismev (???), (c) evd (???), (d) Renext (???), (e) evir (???), and (f) fExtremes (???).
Resulting return levels and errors using mentioned packages are reported in Table @ref(tab:comparisonGPD). Similarly, return levels calculated from the adjustment of the probability distributions GPA, GEV, and Gumbel are shown it Table @ref(tab:comparisonCommonEVD).
| 10 | 20 | 50 | 100 | 250 | 500 | 700 | 1000 | 1700 | 3000 | 7000 | RMSE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| extRemes | 155.6 | 169.3 | 187.2 | 200.4 | 217.6 | 230.3 | 236.4 | 242.8 | 252.2 | 262.1 | 276.6 | 0.057 |
| ismev | 155.5 | 169.3 | 187.1 | 200.4 | 217.5 | 230.1 | 236.2 | 242.6 | 252.0 | 261.9 | 276.4 | 0.057 |
| evd | 155.6 | 169.3 | 187.2 | 200.4 | 217.6 | 230.3 | 236.4 | 242.7 | 252.2 | 262.1 | 276.6 | 0.057 |
| Renext Renouv | 155.6 | 169.3 | 187.2 | 200.4 | 217.6 | 230.3 | 236.4 | 242.7 | 252.2 | 262.1 | 276.6 | 0.057 |
| evir | 155.0 | 168.5 | 185.8 | 198.6 | 215.1 | 227.3 | 233.1 | 239.2 | 248.2 | 257.6 | 271.3 | 0.058 |
| fExtremes | 155.5 | 169.3 | 187.2 | 200.4 | 217.5 | 230.2 | 236.3 | 242.6 | 252.0 | 261.9 | 276.5 | 0.057 |
| Renext 2 parameters | 200.8 | 203.9 | 206.5 | 207.8 | 208.9 | 209.4 | 209.6 | 209.7 | 209.9 | 210.1 | 210.3 | 0.337 |
| NAME | 10 | 20 | 50 | 100 | 250 | 500 | 700 | 1000 | 1700 | 3000 | 7000 | RMSE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gpa | Generalized Pareto | 149.6 | 160.6 | 174.2 | 183.9 | 195.8 | 204.2 | 208.2 | 212.2 | 218.0 | 223.9 | 232.2 | 0.048 |
| gev | Generalized Extreme Value | 172.5 | 198.8 | 239.2 | 274.8 | 329.5 | 377.8 | 403.5 | 432.7 | 479.9 | 536.0 | 631.7 | 0.058 |
| gum | Gumbel | 140.9 | 152.1 | 167.0 | 178.2 | 193.0 | 204.3 | 209.7 | 215.5 | 224.1 | 233.3 | 247.0 | 0.067 |
Maps in this section correspond to: (a) existing hurricane maps from previous studies, (b) non-hurricane wind maps created in this study with POT-PP (ERA5 and ISD), and (c) final maps (integrated results of hurricane and non-hurricane studies) for ERA5 and ISD.
The Colombian consulting firm Ingeniar Ltda, following the methodology described in (???), and (???), has provided raster wind maps for return periods 10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, and 7000 years, resulting from the probabilistic study of winds due to hurricanes in the Colombian Caribbean Sea and the surrounding continental zone. Figure @ref(fig:ingeniarmaps) shows three of mentioned maps.
Ingeniar Hurricane Wind Maps
Using POT-PP to calculate RL in ISD stations, continuous maps covering the study area were created using Spatial Interpolation Techniques as described in Methodology.
ISD Non-Hurricane Wind Maps
Applying POT-PP to the variable ten meters wind gust fg10 from ERA5 reanalysis dataset, RL were calculated. As RL for ERA5 represent predefined square cells of \(0.25^\circ\) decimal degrees, no interpolation process was required.
ERA5 Non-Hurricane Wind Maps
Following the procedure described in Integration with Hurricane Data, final wind maps were created, combining existing data for hurricane studies, and non-hurricane maps produced in this study.
ISD Hurricane & Non-Hurricane Wind Maps
ERA5 Hurricane & Non-Hurricane Wind Maps
The main difference between POT-PP and POT-Poisson-GPD is that in the latter, wind quantities are adjusted to a GPD, and the time is adjusted to a Poisson Process (1D), while in the former, time and magnitude are adjusted to a Poisson Process (2D). If there is no weather classification available (storm and no storm) in the wind time series, POT-PP loses its advantages and resembles in potential and scope to POT-Poisson-GPD, because the intensity function varying only in magnitude, becomes similar to a GPD. For this reason, POT-PP method is really useful, if historical classification of time series is available, record by record, in storm and non-storm.
Regarding the comparison of the data, it must be remembered that the basis of comparison, that is, the one that represents the truth in the field - IDEAM field measurements, was not fully available, what disturbed the process since before starting it. On the other hand, there are many uncertainties with respect to the model that represents the ISD database, because first, the available documentation does not specify whether it is an average or a gust data, second, the comparative graphs showed that ISD database did not represent a continuous variable (vertical or horizontal stripes in scatter plots), and finally, the comparisons against IDEAM never showed good results.
With respect to ERA5 database, although the comparative results showed greater similarity, it should be remembered that each record in the time series does not represent a point value, on the contrary, it represents a square cell of 0.25 decimal degrees. The IDEAM stations with which the comparison was made, can fall into any location of the cell, and constitute only a very local condition, that is not represented by an average forecast for the whole cell size. This considers that Colombia is a tropical region with a widely diverse territory (mountains, plains, rivers, forests, etc.) and climate. So the possible similarity between IDEAM and ERA5 is limited by this condition.
This classification by time of historical series, is useful because it allows to define more precisely the average rate of events per year (Poisson process rate), which in POT-PP is represented by the average amount of events time per year, this is, components \(A_t\) and \(A_{nt}\) of Equation @ref(eq:pprl) \(A_t\int_{Y_N}^{\infty}\lambda_t\left( y\right)\,dy + A_{nt}\int_{Y_N}^{\infty}\lambda_{nt}\left( y\right)\,dy = \frac{1}{N}\), used to calculate return levels \(Y_N\).
By the lack of thunderstorm and non-thunderstorm information, it is impossible to calculate which part of the annual time belongs to storm and which to non-storm. As the available data were all assumed as non-thunderstorm data, this average time of events per year will always result in a fixed wrong value of 365 days, the maximum possible value. For ISD, this condition is reflected in high and unlikely final results. However, this condition did not affect the results in the ERA5 database in the same way. Although also in ERA5, all the data were classified as “non-thunderstorm”, and the average time of events was always 365 days, an additional condition made the final result more realistic. Contrary to what happens in the ISD database, where time series have many gaps and there is lack of information, used ERA5 database has the full time series, hour by hour, from 1979 to 2019. Following the theory behind POT-PP, this implies that there is only one cluster for the whole time series, which would leave a single data after the de-clustering, canceling the entire subsequent process. Our proposed solution was to work only with one data per week, the maximum, which implies that the de-clustering process will have no effect, since it is based on 4-day gaps, resulting in more events above the threshold, exactly 48 events for each of the 40 years of history, which translates into greater averaging of the final wind data.
One of the objectives of the investigation was to compare different methods in the calculation of return levels, and that was achieved using in all cases, both POT-Poisson-GPD, and POT-PP. The Poisson for POT-Poisson-GPD, does not accept data classification by time (storm or non-storm), because it is one-dimensional and data must represent a single general type of event: wind. Despite that, it is important to emphasize that the shortcomings in the calculation of the Poisson process rate, similarly affected the application of both methods, so in all cases, the results have the same limitation, and the use of POT-Poisson-GPD does not represent best quality in the results.
In relation to the use of PostgreSQL to store time series, seeking to decrease RAM load to avoid memory overflow or slow processes, it can be concluded that although it is an efficient solution, it is also a delayed solution in terms of its implementation effort (process text files, load them, and unstack them in the database). In addition, replication of research for future work is more difficult by the need to install the database engine and restore the database backup, compared to the simple use of text files. It is recommended to use a database for analysis and processing with R and tidyverse lazy tibble data-frame technologies, when data sources are already stored in a database as PostgreSQL.
Final recommendation to optimize memory management when time series are processed are: (a) to work directly with text files, but make a good code design avoiding to load all time series into memory at the same time, i.e., serial processing file by file, (b) make sure to remove used objects from memory rm("object"), (c) make memory garbage collection gc() to frees up unwanted objects, or (d) reuse the same variables in each cycle of the process, to use always the same previously allocated memory spaces. Next chunk of code shows some important memory management options.
#List objects in current environment
ls()
#Remove all objects in current environment
rm(list=ls())
#After removing object, apply garbage collection
gc()
#List object sorted by size
sort(sapply(ls(),function(x){object.size(get(x))}))
#Remove an specific object
rm("name")
#Review some memory values
library(pryr)
mem_used()
memory.size()
memory.limit()
#Change memory limit
memory.limit(size = value)
For big NetCDF files, as in this study where variable fg10 (3-s wind gust) stored 3.381 (49 cols x 69 rows) cells and in each cell 356.472 hours, representing 1.205’231.832 different wind records in a file size of 3.53 GB, it is recommended to get specific needed slices instead of loading the entire file into memory. For instance, use parameters start, and count of function ncvar_get (R package ncdf4) to get desired NetCDF records, as shown in code below where time series of first ERA5 cell/station (upper-left) is assigned to variable firstcelltimeseries.
library(ncdf4)
ncname <- "outfile_nc4c_zip9.nc"
ncin <- nc_open(filename)
#First column
firstcol=1
#First row
firstrow=1
#First time
firsttime = 1
#Number of cols to get
ncols=1
#Number of rows to get
nrows=1
#Total number of time records (time dimension)
timedim <- dim(ncvar_get(ncin, "time"))
#Number of time records to get
ntime = timedim
firstcelltimeseries = ncvar_get(nc=ncin, varid='fg10',
start=c(firstcol,firstrow,firsttime),
count=c(ncols,nrows,ntime))
To improve the quality of extreme wind analysis in future research, the inclusion of seasonal effects is recommended. This can be done in two ways, first, using a separate POT-PP for each season, and second, model the Poisson process parameters (location, scale and shape) as sinusoidal functions of time. Finally, it is possible to include more formal statistics (not only graphics) using reliable and complete measured data to face the downscaling challenge.
The conclusions of this study are:
Final maps using ERA5 forecast database, representing return periods of 700, 1700, and 3000 years, are the extreme velocities needed as input load for the design of structures of different use category in the study area. Nevertheless, by one hand, full data from the IDEAM source is needed to enable the validation of downscaling support, on the other hand, it is essential to include in the study the classification of thunderstorm and non-thunderstorm data to achieve more realistic results, and finally, an additional conservative calibration process is needed, where to each municipality is assigned only a wind velocity in order to define final values that will be part of the structure design norm.
In the absence of wind field measurements, alternatives data sources as ISD and ERA5 can be a viable source of data to calculate extreme wind events, but always must be searched for statistic or graphic support for the downscaling issue, and at the end a process of calibration is needed for each particular case.
A powerful R tool was implemented to apply extreme value analysis using POT-PP method allowing the calculation of comparative results with POT-Poisson-GPD approach (using existing packages).
Results of this research could be the starting point of a process to update wind maps in countries with information issues.
Output results of this research will contribute to reduce the risk of infrastructure collapse, representing a favorable impact in human lives, material losses, and disaster prevention.
For a detailed analysis of the results, refer to Results and Discussion section, and for a discussion about the project and its relevant topics, refer to Final Discussion.
0em
| Excel Sheet Name | Description | Important |
|---|---|---|
| nt_evd-fgev_fGumbel | Non thunderstorm. Fitting Gumbel using evd::fgev | Do not use |
| nt_bbmle-mle2 | Non thunderstorm. Fitting Gumbel using bbmle::mle2 | Do not use |
| nt_nll-optim | Non thunderstorm. Fitting Gumbel using negative likelihood and stats::optim | Do not use |
| nt_fitdistrplus-fitdist | Non thunderstorm. Fitting Gumbel using fitdistrplus::fitdist | Do not use |
| nt_extRemes | Non thunderstorm. Calculation of return levels POT-GPD, using extRemes::fevd | Do not use |
| nt_distLquantile_quant | Non thunderstorm. Calculation of return levels and RMSE (POT-GPD and EVDs), using extremeStat::distLquantile | To compare with POT-PP |
| nt_distLquantile_parameters | Non thunderstorm. Calculation of fitting parameters POT-GPD and EVD, using extremeStat::distLquantile | To compare with POT-PP |
| nt_distLextreme_returnlev | Non thunderstorm. Calculation of return levels POT-GPD and EVD, using extremeStat::distLextreme | To compare with POT-PP |
| nt_distLextreme_parameter | Non thunderstorm. Calculation of fitting parameters POT-GPD and EVD, using extremeStat::distLextreme | To compare with POT-PP |
| nt_POT-GPD-Equivalent | Non thunderstorm. For POT-PP and using POT-GPD equivalent. Calculation of Goodness of Fit and RMSE | Use as Goodness of Fit of POT-PP |
| Excel Sheet Name | Description |
|---|---|
| all_years | Raw data time series statistics by year |
| all_months | Raw data time series statistics by month |
| all_weeks | Raw data time series statistics by week |
| all_gaps30days | Raw data gaps of 30 days of more |
| nt_years | Non-thunderstorm time series statistics by year |
| nt_months | Non-thunderstorm time series statistics by month |
| nt_weeks | Non-thunderstorm time series statistics by week |
| nt_gaps30days | Non-thunderstorm gaps of 30 days of more |
| IMP.VALS | Main statistics of dataset after de-clustering and thresholding |
| declu_nt_years | Non-thunderstorm time series statistics by year, after de-clustering and thresholding |
| declu_nt_months | Non-thunderstorm time series statistics by month, after de-clustering and thresholding |
| declu_nt_weeks | Non-thunderstorm time series statistics by week, after de-clustering and thresholding |
| declu_nt_gaps30days | Non-thunderstorm gaps of 30 days of more, after de-clustering and thresholding |
| Graphic | Description |
|---|---|
| Page 1 | Time Series Plot for Raw Data |
| Page 2 | Time Series Plot for Non-Thunderstorm (‘nt’) |
| Page 3 | Log-Likelihood(Gumbel) - Optim (nll-optim) |
| Page 4 | De-clustered - Non-Thunderstorm - fitdistrplus-fitdist(gumbel) |
| Page 5 | De-clustered Non-Thunderstorm (‘nt’) Time Series |
| Page 6 | W-Statistic Plot for best pair of thresholds |
| Page 7 | De-clustered - Non-Thunderstorm - Density Function from Intensity Function of Poisson Process |
| Page 8 | De-clustered - Non-Thunderstorm - POT-GPD Equivalent |
| Page 9 | De-clustered - Non-Thuderstorm - Cumulative Distribution Function from Intensity Function of Poisson Process |
| Page 10 | De-clustered - Non-Thunderstorms - Hazard Curve. Poisson Process Intensity Function |
| Page 11 | De-clustered - Non-Thunderstorms - Hazard Curve. Gumbel like tail Intensity Function of Poisson Process |
| Page 12 | Non-Thunderstorms. Gumbel Density Function, but using parameters of Poisson Process |
| Page 13 | Non-Thunderstorms. Gumbel Cumulative Distribution, but using parameters of Poisson Process |
| Page 14 | De-clustered Non-Thunderstorm. Fitted Gumbel density function using parameters of Poisson Process |
| Page 15 | De-clustered - Non-Thunderstorms - Hazard Curve. Gumbel Quantile Function using parameters of Poisson Process |
| Column ID | Columns Name | Important | Description |
|---|---|---|---|
| 1 | id | Consecutive Row ID | |
| 2 | t_thresh | Not Available | Threshold for thunderstorm data |
| 3 | t_mu_location | Not Available | Location for thunderstorm data |
| 4 | t_psi_scale | Not Available | Scale for thunderstorm data |
| 5 | nt_thresh | Threshold for non-thunderstorm data | |
| 6 | nt_mu_location | Location for non-thunderstorm data | |
| 7 | nt_psi_scale | Scale for non-thunderstorm data | |
| 8 | distance_w | Minimum W distance to choose best threshold pairs | |
| 9 | station | Station ID | |
| 10 to 20 | t_MRI_poissonprocessintfunc | Not Available | Thunderstorm Return levels for MRIs (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, 7000), using Poisson Process Intensity Function |
| 21 to 31 | t_MRI_gumbeltailintfunc | Not Available | Thunderstorm Return levels for MRIs (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, 7000), using PP Gumbel Tail Intensity Function |
| 32 to 42 | t_MRI_gumbelquantilefunc | Not Available | Thunderstorm Return levels for MRIs (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, 7000), using PP Gumbel Quantile Function |
| 43 to 53 | nt_MRI_poissonprocessintfunc | Used to create maps! | Non-thunderstorm Return levels for MRIs (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, 7000), using Poisson Process Intensity Function |
| 54 to 64 | nt_MRI_gumbeltailintfunc | Do not use | Non-thunderstorm Return levels for MRIs (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, 7000), using PP Gumbel Tail Intensity Function |
| 65 to 75 | nt_MRI_gumbelquantilefunc | Do not use | Non-thunderstorm Return levels for MRIs (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, 7000), using PP Gumbel Quantile Function |
| 76 to 86 | tnt_MRI_poissonprocessintfunc | Not Available | Combined (t and nt) Return levels for MRIs (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, 7000), using Poisson Process Intensity Function |
| File | Description |
|---|---|
| rl_nonhurricanes_4326_10_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 10 years |
| rl_nonhurricanes_4326_20_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 20 years |
| rl_nonhurricanes_4326_50_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 50 years |
| rl_nonhurricanes_4326_100_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 100 years |
| rl_nonhurricanes_4326_250_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 250 years |
| rl_nonhurricanes_4326_500_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 500 years |
| rl_nonhurricanes_4326_700_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 700 years |
| rl_nonhurricanes_4326_1000_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 1000 years |
| rl_nonhurricanes_4326_1700_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 1700 years |
| rl_nonhurricanes_4326_3000_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 3000 years |
| rl_nonhurricanes_4326_7000_st.tif | ERA5 POT-PP non-hurricane wind map. MRI 7000 years |
| rl_combined_4326_10_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 10 years |
| rl_combined_4326_20_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 210 years |
| rl_combined_4326_50_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 50 years |
| rl_combined_4326_100_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 100 years |
| rl_combined_4326_250_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 250 years |
| rl_combined_4326_500_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 500 years |
| rl_combined_4326_700_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 700 years |
| rl_combined_4326_1000_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 1000 years |
| rl_combined_4326_1700_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 1700 years |
| rl_combined_4326_3000_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 3000 years |
| rl_combined_4326_7000_st.tif | ERA5 final wind map (hurricane + non-hurricane). MRI 7000 years |
| File | Description |
|---|---|
| nh_10.tif | ISD POT-PP non-hurricane wind map. MRI 10 years |
| nh_20.tif | ISD POT-PP non-hurricane wind map. MRI 20 years |
| nh_50.tif | ISD POT-PP non-hurricane wind map. MRI 50 years |
| nh_100.tif | ISD POT-PP non-hurricane wind map. MRI 100 years |
| nh_250.tif | ISD POT-PP non-hurricane wind map. MRI 250 years |
| nh_500.tif | ISD POT-PP non-hurricane wind map. MRI 500 years |
| nh_700.tif | ISD POT-PP non-hurricane wind map. MRI 700 years |
| nh_1000.tif | ISD POT-PP non-hurricane wind map. MRI 1000 years |
| nh_1700.tif | ISD POT-PP non-hurricane wind map. MRI 1700 years |
| nh_3000.tif | ISD POT-PP non-hurricane wind map. MRI 3000 years |
| nh_7000.tif | ISD POT-PP non-hurricane wind map. MRI 7000 years |
| isd_combined_4326_10_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 10 years |
| isd_combined_4326_20_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 210 years |
| isd_combined_4326_50_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 50 years |
| isd_combined_4326_100_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 100 years |
| isd_combined_4326_250_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 250 years |
| isd_combined_4326_500_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 500 years |
| isd_combined_4326_700_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 700 years |
| isd_combined_4326_1000_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 1000 years |
| isd_combined_4326_1700_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 1700 years |
| isd_combined_4326_3000_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 3000 years |
| isd_combined_4326_7000_st.tif | ISD final wind map (hurricane + non-hurricane). MRI 7000 years |
The European Center for Medium-Range Weather Forecasts - ECMWF had implemented the Climate Data Storage - CDS https://cds.climate.copernicus.eu/, where all its datasets can be downloaded, however there is a straightforward way to get ERA5 data through Python library CDSAPI. Before to use CDSAPI, it is necessary to research about names and meanings of ERA5 variables using the official data documentation web page https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation, or the parameter database https://apps.ecmwf.int/codes/grib/param-db, that includes all ECWMF data sources, not only ERA5.
Next block of code shows the use of CDSAPI (Python) to download ERA5 variable 10fg - 10 meters wind gust, from 1979 to 1991 for Colombia area. The most important keywords there, allow to define ERA5 data source and product type, variable name ‘variable’, format NetCDF ‘format’, area of interest ‘area’, using WGS88 coordinates in the format north, west, south, east, cell size ‘grid’ in decimal degrees, and all the keywords related to time ‘year’, ‘month’, ‘day’, ‘time’.
import cdsapi
c = cdsapi.Client()
c.retrieve('reanalysis-era5-single-levels',{
'product_type':'reanalysis',
'format':'netcdf',
'variable':'10m_wind_gust_since_previous_post_processing',
'year':['1979','1980','1981','1982','1983','1984','1985','1986','1987','1988','1989','1990','1991'],
'month':['01','02','03','04','05','06','07','08','09','10','11','12'],
'time':['00:00','01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00',
'12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00'],
'day':['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18',
'19','20','21','22','23','24','25','26','27','28','29','30','31'],
'area':[12.5, -79.1, -4.5, -66.8], # North, West, South, East.
'grid':[0.25,0.25]},
'10fg_1979_1991_netcdf_.25x.25.nc')
Next Table @ref(tab:pythonera5), shows all Python scripts used to download variables 10fg - 10 meters wind gust and fsr - forecast surface roughness, for the study area, along the period 1979 to 2019. Last file in the table, hold a summary of commands to manipulate NetCDF files.
After downloading separate NetCDF files, there are different tools available, to manipulate them, for instance, Climate Data Operators - CDO https://code.mpimet.mpg.de/projects/cdo/, NetCDF command line utilies https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_working_with_netcdf_files.html, and NetCDF operator - NCO http://nco.sourceforge.net/.
Next CDO command, will join by time all NetCDF files inside a folder, where -f defines the file format, -b defines the data format, and -z defines the compression level (from 1 to 9). The resulting file name is outfile_nc4c_zip9.nc. Then with cdo griddes is possible to review output file content.
cdo -f nc4c -b F32 -z zip_9 mergetime *.nc outfile_nc4c_zip9.nc
cdo griddes outfile_nc4c_zip9.nc
Using NetCDF command line utilities, next commands will explore the content of a file, and change file format.
ncdump -h file.nc
nccopy -k 'nc4' -d 9 file.nc file_d9.nc
In Linux, next NCO commands will extract variable p0001 from file.nc to p0001.nc, rename variables p0001 to fsr in file p0001.nc, and view the result.
ncks -v p0001 file.nc p0001.nc
ncrename -v p0001,fsr p0001.nc
ncview p0001.nc
To optimize memory usage when running R code, some information related to IDEAM and ISD data sources, like stations and wind time series, were stored in PostgreSQL database, thanks to the possibility of lazy evaluation with tidyverse R package dplyr, see (???), and (???), which do not load the entire dataset in memory from the beginning, this is, “it delays the actual operation until necessary and loads data onto R from the database only when we need it” (???). In parallel, it was used throughout the investigation, the tidyverse R package tibble, which allow lazy and surly data-frames in R, namely “they do less …, and complain more” (???). As a reference, there is an R package called tidyverse, which installs and loads all related packages (ggplot2, tibble, tidyr, readr, and dplyr), see (???), and (???).
R Package dplyr can be used to load data-frames created with text files, to tables inside databases, using for instance, copy_to function, see (???), nonetheless an alternative procedure was done with MS-DOS commands, and SQL scripts using pgsql (terminal-based front-end to PostgreSQL). This procedure is described below for IDEAM data VV_AUT_10 - instantaneous wind velocity each ten (10) minutes, see Table @ref(tab:tabledatasources2).
There is one text file for each station time series in VV_AUT_10. There are a total of 204 stations, that is 204 files. The file name follow the format VV_AUT_10@*.data, where * is replaced by the station identifier, as can be seen below.
VV_AUT_10@31095030.data
VV_AUT_10@29065130.data
VV_AUT_10@29065140.data
...
Below is the partial content of a time series file (VV_AUT_10@31095030.data). It has two columns (Fecha and Valor) separated by character |.
Fecha|Valor
2008-10-29 10:30:00|0.9
2008-10-29 10:40:00|1.4
2008-10-29 10:50:00|1.2
2008-10-29 11:00:00|1.5
...
Install PostgreSQL 10.5
Create database with next credentials
| Credential | Value |
|---|---|
| dbname | winddata |
| host | localhost |
| pot | 5432 |
| user | user1 |
| password | user1 |
-- Database: winddata
-- DROP DATABASE winddata;
CREATE DATABASE winddata
WITH
OWNER = user1
ENCODING = 'UTF8'
LC_COLLATE = 'English_United States.1252'
LC_CTYPE = 'English_United States.1252'
TABLESPACE = pg_default
CONNECTION LIMIT = -1;
CREATE TABLE public."TEMP" (
txtfecha VARCHAR(30),
valor NUMERIC(6,2) ) WITH (oids = false);
ALTER TABLE public."TEMP"
OWNER TO user1;
CREATE TABLE public."VV_AUT_10" (
id BIGSERIAL,
estationid BIGINT,
txtfecha VARCHAR(30),
valor NUMERIC(6,2),
"timestamp" TIMESTAMP(0) WITHOUT TIME ZONE ) WITH (oids = false);
ALTER TABLE public."VV_AUT_10"
OWNER TO user1;
REM 1: consecutive id
REM 2: file name
REM 3: station id
echo --(%1 of 204)>>"load_vv_aut_10.logdos"
echo --(%1 of 204)
echo Empty table TEMP
echo psql -L "load_vv_aut_10.logquery" -c "DELETE FROM public.\"TEMP\""
"postgresql://user1:user1@127.0.0.1/winddata" >>"load_vv_aut_10.logdos"
psql -L "load_vv_aut_10.logquery" -c "DELETE FROM public.\"TEMP\""
"postgresql://user1:user1@127.0.0.1/winddata" >>"load_vv_aut_10.logdos"
echo Copy from FILE to TEMP
echo psql -L "load_vv_aut_10.logquery" -c "COPY public.\"TEMP\" FROM 'modified\%2'
(FORMAT CSV, DELIMITER'|', HEADER)" "postgresql://postgres:postgres@127.0.0.1/winddata"
>>"load_vv_aut_10.logdos"
psql -L "load_vv_aut_10.logquery" -c "COPY public.\"TEMP\" FROM 'modified\%2'
(FORMAT CSV, DELIMITER'|', HEADER)" "postgresql://postgres:postgres@127.0.0.1/winddata"
>>"load_vv_aut_10.logdos"
echo Insert from TEMP into VV_AUT_10
echo psql -L "load_vv_aut_10.logquery" -c "INSERT INTO public.\"VV_AUT_10\"(estationid, txtfecha, valor,
timestamp) SELECT %3, txtfecha, valor, TO_TIMESTAMP(txtfecha, 'YYYY-MM-DD HH24:MI:SS') FROM
public.\"TEMP\"" "postgresql://postgres:postgres@127.0.0.1/winddata" >>"load_vv_aut_10.logdos"
psql -L "load_vv_aut_10.logquery" -c "INSERT INTO public.\"VV_AUT_10\"(estationid, txtfecha, valor,
timestamp) SELECT %3, txtfecha, valor, TO_TIMESTAMP(txtfecha, 'YYYY-MM-DD HH24:MI:SS') FROM
public.\"TEMP\"" "postgresql://postgres:postgres@127.0.0.1/winddata" >>"load_vv_aut_10.logdos"
echo HECHO! >>"load_vv_aut_10.logdos"
echo HECHO!
echo .. >>"load_vv_aut_10.logdos"
echo ..
echo .>>"load_vv_aut_10.logdos"
echo .
Copy previous MS-DOS commands into a text file called load_vv_aut_10.bat. To use this file three parameters need to be passed, 1) consecutive id, 2) file name, and 3) station id. All results (after call it) are stored in a log file named load_vv_aut_10.logdos. All SQL commands including the pgsql answer/result after code execution, are stored in a log file named load_vv_aut_10.logquery. The script, first, keep a consecutive record of the procedure inside the log file, second, empty the contents of the Temp table, then, copy from text file to table Temp, and finally, from Temp inserts into VV_AUT_10. In all script steps, detailed output logs are sent to files load_vv_aut_10.logdos, and load_vv_aut_10.logquery.
Call previous file using MS-DOS commands as shown below, where first parameters are a consecutive identifier, second is time series file name, and third one is the station identifier.
call load_vv_aut_10.bat 1 VV_AUT_10_57025020.csv 57025020
call load_vv_aut_10.bat 2 VV_AUT_10_31095030.csv 31095030
call load_vv_aut_10.bat 3 VV_AUT_10_29065130.csv 29065130
call load_vv_aut_10.bat 4 VV_AUT_10_29065140.csv 29065140
...
Consider following recommendations
c:
cd load_data
Using previous procedure, following ISD and IDEAM information were loaded to the database, see Table @ref(tab:tabledatasources2):
Previous listed tables have stacked stations information, namely, with the same number of columns, the records of the first station occupy a certain number of rows, and the station identifier is repeated in each row, and below them, increasing the number of rows, the information of other stations is stored.
Previous stacked tables were transform to unstacked versions, namely, one column for each station identifier. Stacked table ALL_VVMX_AUT_60 was transformed to unstacked table IDEAM_VVMX_60, using next chunk of SQL code.
CREATE table ideam_vvmx_60 AS
SELECT * FROM crosstab(
'SELECT DISTINCT
timestamp,
estacionid,
valor
FROM
public."ALL_VVMX_AUT_60"
ORDER BY timestamp, estacionid',
'SELECT DISTINCT
estacionid
FROM
public."ALL_VVMX_AUT_60"
ORDER BY estacionid')
AS ct(mydatetime timestamp, "11105020" real, "11135030" real, "12015100" real, "12015110" real,
"13035501" real, "13085050" real, "14015080" real, "15015120" real, "15065180" real, "15065190" real,
"15065501" real, "15075150" real, "15075501" real, "15079010" real, "15085050" real, "16015110" real,
"16015120" real, "16015130" real, "16015501" real, "16055120" real, "17015010" real, "17035010" real,
"21015030" real, "21015040" real, "21015050" real, "21015060" real, "21015070" real, "21055070" real,
"21105030" real, "21115010" real, "21115170" real, "21115180" real, "21145080" real, "21185090" real,
"21195160" real, "21195170" real, "21195190" real, "21205012" real, "21205710" real, "21205791" real,
"21205910" real, "21205940" real, "21206280" real, "21206600" real, "21206790" real, "21206920" real,
"21206930" real, "21206940" real, "21206950" real, "21206960" real, "21206980" real, "21206990" real,
"21215150" real, "21215160" real, "21215170" real, "21215180" real, "21215190" real, "21235030" real,
"21255160" real, "21255170" real, "22025040" real, "22075050" real, "23035030" real, "23065180" real,
"23065190" real, "23085260" real, "23085270" real, "23105060" real, "23105070" real, "23125160" real,
"23125170" real, "23195190" real, "23195230" real, "23195240" real, "23195502" real, "24015110" real,
"24015380" real, "24025090" real, "24035360" real, "24035370" real, "24035380" real, "24035390" real,
"24035410" real, "24035430" real, "24055070" real, "24055080" real, "25025000" real, "25025002" real,
"25025030" real, "25025280" real, "25025340" real, "25025350" real, "25025360" real, "25025380" real,
"26015010" real, "26015030" real, "26035090" real, "26035100" real, "26055100" real, "26055110" real,
"26055120" real, "26075120" real, "26075150" real, "26085160" real, "26085170" real, "26095320" real,
"26105240" real, "26105250" real, "26115090" real, "26125061" real, "26125290" real, "26125300" real,
"26125710" real, "26135290" real, "26135300" real, "26135310" real, "26135320" real, "26135330" real,
"26145090" real, "26155220" real, "26155230" real, "26155240" real, "26185030" real, "26185050" real,
"26225060" real, "26255030" real, "27015280" real, "27015290" real, "27015300" real, "27015310" real,
"27015320" real, "27015330" real, "28025120" real, "28025130" real, "28025502" real, "28035060" real,
"29004520" real, "29015000" real, "29015040" real, "29035000" real, "29035200" real, "29045000" real,
"29045150" real, "29045190" real, "29065000" real, "29065120" real, "29065130" real, "31095030" real,
"32105080" real, "35025080" real, "35025090" real, "35025110" real, "35035100" real, "35035110" real,
"35035130" real, "35075070" real, "35075080" real, "35085060" real, "35085070" real, "35085080" real,
"35095120" real, "35095130" real, "35165000" real, "35185010" real, "35195060" real, "35215020" real,
"35215030" real, "35225030" real, "35235040" real, "35235050" real, "36015020" real, "37015030" real,
"44015060" real, "44015070" real, "44035040" real, "44035050" real, "46015030" real, "46035010" real,
"47035030" real, "48015040" real, "48015050" real, "51025060" real, "51025080" real, "52015050" real,
"52025080" real, "52025090" real, "52035040" real, "52045080" real, "52055150" real, "52055160" real,
"52055170" real, "52055210" real, "52055220" real, "52055230" real, "53045040" real, "53075020" real,
"54077210" real, "55015010" real, "56019010" real, "57025020" real, "1111500036" real,
"2612500038" real, "3706500109" real)
Next list shows unstacked tables inside the database:
In addition, stations catalogs of IDEAM and ISD were loaded as tables. Main fields are identifier, name, latitude and longitude:
Next chunk of code shows the use of dplyr and tibble packages with PostgreSQL tables
library(dplyr)
con1 = src_postgres(dbname = "winddata", host = "localhost",
port = 5432, user = "user1", password = "user1")
originalfields3 = c("id", "usaf", "station_name", "latitud", "longitud")
originalfields3 = paste(originalfields3, collapse= ", ", sep = "")
query3 = paste("select", originalfields3,
"from isd_all_stations where usaf_isd_dataua != ''", sep=" ")
isd_stations = as_tibble(tbl(con1, sql(query3)))
library(sf)
isd_stations = st_as_sf(isd_stations, coords = c("longitud", "latitud"), crs = 4326)
Object isd_stations belongs to classes sf, tbl_df, tbl, and data.frame. Its description is shown below using the object name.
class(isd_stations)
## [1] "sf" "tbl_df" "tbl" "data.frame"
isd_stations
## Simple feature collection with 100 features and 3 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -81.711 ymin: -5.894 xmax: 0 ymax: 13.357
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 100 x 4
## id usaf station_name geometry
## <int> <chr> <chr> <POINT [°]>
## 1 13848 690186 SAN JOSE DEL GUAVIA (-72.633 2.567)
## 2 14080 698689 MARANDUA AB (-68.686 5.524)
## 3 14081 698704 AFWA ASSIGNED (-74.633 4.217)
## 4 21913 789820 REINA BEATRIX INTL (-70.015 12.501)
## 5 21918 800000 BOGUS COLOMBIAN (0 0)
## 6 21919 800010 GUSTAVO ROJAS PINILLA (-81.711 12.584)
## 7 21921 800020 EL EMBRUJO (-81.358 13.357)
## 8 21925 800090 SIMON BOLIVAR (-74.231 11.12)
## 9 21926 800220 RAFAEL NUNEZ (-75.513 10.442)
## 10 21927 800280 ERNESTO CORTISSOZ (-74.781 10.89)
## # ... with 90 more rows
Next chunks of commands were used to create database backup.
pg_dump.exe -f dump-C-E-o-s_winddata_2020_03.sql -C -E UTF8 -o -s -h localhost -U postgres winddata
Make a backup with option Fc (to load with pg_restore). Be aware of the use of option “copy” (not the option insert –D).
pg_dump.exe -Fc -a -E UTF8 -h localhost -U postgres -f dump-Fc-a-E_winddata_2020_03.dump winddata
It is possible to load data from a backup file to the database, using a TOC text file containing the list of tables to be restored. The user can edit this file to decide which tables to restore or change the restoration order. TOC file created with next command, will have the load order by default (this command only works with backups created with the options Fc o Ft)
pg_restore.exe -l -f dump-Fc-a-E_winddata_2020_03.TOC dump-Fc-a-E_winddata_2020_03.dump
The content of default TOC file (dump-Fc-a-E_winddata_2020_03.TOC), as result from previous command is shown below
; Archive created at 2020-03-25 15:38:55
; dbname: winddata
; TOC Entries: 17
; Compression: -1
; Dump Version: 1.13-0
; Format: CUSTOM
; Integer: 4 bytes
; Offset: 8 bytes
; Dumped from database version: 10.5
; Dumped by pg_dump version: 10.5
; Selected TOC Entries:
2865; 0 259855 TABLE DATA public ALL_VVMX_AUT_60 user1
2868; 0 259883 TABLE DATA public TEMP user1
2867; 0 259879 TABLE DATA public VV_AUT_10 user1
2875; 0 260005 TABLE DATA public ideam_all_stations user1
2876; 0 260092 TABLE DATA public ideam_vvmx_60 user1
2874; 0 259958 TABLE DATA public isd_all_stations user1
2871; 0 259898 TABLE DATA public isd_lite user1
2873; 0 259926 TABLE DATA public isd_lite_copy user1
2869; 0 259892 TABLE DATA public isd_lite_string user1
2877; 0 260096 TABLE DATA public isd_lite_unstack user1
2883; 0 0 SEQUENCE SET public ALL_VVMX_AUT_60_id_seq user1
2884; 0 0 SEQUENCE SET public ISD_LITE_ID_seq user1
2885; 0 0 SEQUENCE SET public VV_AUT_10_id_seq user1
2886; 0 0 SEQUENCE SET public isd_lite_new_id_seq user1
Before to restore the schema from file dump-C-E-o-s_winddata_2020_03.sql, it is possible to manually edit it, considering special user needs:
In addition, it is important to remember:
psql.exe -f dump-C-E-o-s_winddata_2020_03.sql postgres postgres
To load data it is necessary to use TOC file, which can be edited to change load order, or load only specific tables. To proceed with next command, please use the user postgres
pg_restore.exe -d winddata -Fc -a -L dump-Fc-a-E_winddata_2020_03.TOC -h localhost -U postgres
-e dump-Fc-a-E_winddata_2020_03.dump
It is possible to restore/load individual tables using next command, where “-n” refers to schema and “-t” refers to table name. Next command will restore table ideam_vvmx_60 from public schema (all tables belong to public schema)
pg_restore -d winddata -Fc -a -h localhost -U postgres -n public -t ideam_vvmx_60 -v
-e dump-Fc-a-E_winddata_2020_03.dump
This appendix includes all of the R chunks of code that were hidden throughout the document (using the include = FALSE or echo=FALSE, message=FALSE, warning=FALSE chunks tag) to help with readability and/or setup.
In Chapter @ref(rmd-data) - Data:
# List of packages required for this analysis
pkg <- c("dplyr", "sf", "ggplot2","rnaturalearth", "rnaturalearthdata", "ggspatial", "kableExtra", "ncdf4", "stars", "magick", "RcmdrMisc",
"knitr", "ggrepel", "grid", "gridExtra", "cowplot", "xts", "bookdown", "lubridate", "devtools")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg))
install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load packages (thesisdown will load all of the packages as well)
library(thesisdown)
library(dplyr)
library(sf)
library(ggplot2)
library(rnaturalearth)
library(rnaturalearthdata)
library(ggspatial)
library(knitr)
library(kableExtra)
library(ncdf4)
library(stars)
library(magick)
library(RcmdrMisc)
library(ggrepel)
library(grid)
library(gridExtra)
library(cowplot)
library(xts)
library(lubridate)
#Load IDEAM and ISD Stations
con1 = src_postgres(dbname = "winddata", host = "localhost", port = 5432, user = "user1", password = "user1")
#Get Ideam Stations Table
originalfields4 = c("objectid", "codigo1", "nombre", "latitud", "longitud", "categoria")
originalfields4 = paste(originalfields4, collapse= ", ", sep = "")
query4 = paste("select", originalfields4, "from ideam_all_stations", "where inpqrs2 = 'YES'", sep=" ")
ideam_stations = as_tibble(tbl(con1, sql(query4)))
Encoding(ideam_stations$categoria) <- "UTF-8"
Encoding(ideam_stations$nombre) <- "UTF-8"
originalfields3 = c("id", "usaf", "station_name", "latitud", "longitud")
originalfields3 = paste(originalfields3, collapse= ", ", sep = "")
query3 = paste("select", originalfields3, "from isd_all_stations where usaf_isd_dataua != ''", sep=" ")
isd_stations = as_tibble(tbl(con1, sql(query3)))
#Create simple features from Ideam Stations
ideam_stations = st_as_sf(ideam_stations, coords = c("longitud", "latitud"), crs = 4326)
#Create simple features from ISD stations
isd_stations = st_as_sf(isd_stations, coords = c("longitud", "latitud"), crs = 4326)
#Plot IDEAM Stations
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
ggplot(data = world) +
geom_sf(fill= "antiquewhite") +
geom_text(data= world_points[venezuela,],aes(x=-67, y=8.5, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-79.2, y=9.2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.2, y=-1, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-4, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-67, y=-2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 13, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = ideam_stations, size=1, aes(shape=categoria, color=categoria), show.legend = "point") +
scale_color_discrete(name = 'Category', labels = c("Agrometeorological", "Ordinary Climatic", "Main Climatic", "Mareographic",
"Special Meteorological", "Main Synoptic")) +
scale_shape_discrete(name = 'Category', labels = c("Agrometeorological", "Ordinary Climatic", "Main Climatic", "Mareographic",
"Special Meteorological", "Main Synoptic")) +
annotation_scale(location = "bl", width_hint = 0.5) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.05, "in"), pad_y = unit(0.05, "in"),
style = north_arrow_fancy_orienteering) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("Longitude") +
ylab("Latitude") +
ggtitle("IDEAM Stations") +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"))
#Plote time series - one IDEAM Station
originalfields = c("21205791")
newfields = paste ("X", originalfields, sep="")
originalfields = paste('"', originalfields, '"', sep = "")
newfields = paste('"', newfields, '"', sep = "")
fiedls_query = paste(originalfields, "as", newfields, sep = " ")
fiedls_query = c(paste('"', "mydatetime", '"', sep = ""), fiedls_query)
fiedls_query = paste (fiedls_query, "", sep= "", collapse=", ")
wherestring = c("21205791")
wherestring = paste('"', wherestring, '"', sep = "")
wherestring = paste(wherestring, "IS NOT NULL", sep = " ")
wherestring = paste(wherestring, collapse = " OR " , sep = " ")
query = paste("select", fiedls_query, "from ideam_vvmx_60", "where", wherestring, sep=" ")
all_vvmx_aut_60 = as_tibble(tbl(con1, sql(query)))
timestamp_all_vvmx_aut_60 <- as.POSIXct(as_tibble(select(all_vvmx_aut_60, mydatetime))$mydatetime,format="%Y-%m-%d %H:%M:%S", tz="UTC")
statideam_xts = na.omit(xts(x=select(all_vvmx_aut_60, "X21205791"), order.by = timestamp_all_vvmx_aut_60))
plot.xts(statideam_xts, main = "Station ID: 21205791\nWind Velocity [m/s]", major.ticks="years", format.labels = "%b-%d\n%Y", legend.loc = "top",
col="green", cex.main=0.3, cex=0.4, cex.axis=0.9, mar = c(2.5,1,0,1), oma = c(0,0,0,0))
#Plot ISD Stations
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
ggplot(data = world) +
geom_sf(fill= "antiquewhite") +
geom_text(data= world_points[venezuela,],aes(x=-67, y=8.5, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-79.2, y=9.2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.2, y=-1, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-4, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-67, y=-2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 13, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = isd_stations, size=1, aes(color= "ISD Stations"), shape=2, show.legend = "point") +
scale_color_manual(values = c("ISD Stations" = "black"), name="") +
annotation_scale(location = "bl", width_hint = 0.5) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.05, "in"), pad_y = unit(0.05, "in"),
style = north_arrow_fancy_orienteering) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("Longitude") +
ylab("Latitude") +
ggtitle("Integrated Surface Database - ISD") +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"))
#Plot - ISD Station
originalfields1 = c("802590")
newfields1 = paste ("X", originalfields1, sep="")
originalfields1 = paste('"', originalfields1, '"', sep = "")
newfields1 = paste('"', newfields1, '"', sep = "")
fiedls_query1 = paste(originalfields1, "as", newfields1, sep = " ")
fiedls_query1 = c(paste('"', "mydatetime", '"', sep = ""), fiedls_query1)
fiedls_query1 = paste (fiedls_query1, "", sep= "", collapse=", ")
wherestring1 = c("802590")
wherestring1 = paste('"', wherestring1, '"', sep = "")
wherestring1 = paste(wherestring1, "IS NOT NULL", sep = " ")
wherestring1 = paste(wherestring1, collapse = " OR " , sep = " ")
query1 = paste("select", fiedls_query1, "from isd_lite_unstack", "where", wherestring1, sep=" ")
isdlite = as_tibble(tbl(con1, sql(query1)))
timestamp_isdlite <- as.POSIXct(as_tibble(select(isdlite, mydatetime))$mydatetime,format="%Y-%m-%d %H:%M:%S", tz="UTC")
statisd_xts = na.omit(xts(x=select(isdlite, "X802590"), order.by = timestamp_isdlite))
plot.xts(statisd_xts, main = "Station ID: 802590\nWind Velocity [m/s]", major.ticks="years", format.labels = "%b-%d\n%Y", legend.loc = "top",
col="green", cex.main=0.3, cex=0.4, cex.axis=0.9, mar = c(2.5,1,0,1), oma = c(0,0,0,0))
#Load ERA5 NetCDF dataset - variable fg10
ncname <- "outfile_nc4c_zip9"
filename <- paste("./data/", ncname, ".nc", sep = "")
ncin <- nc_open(filename)
lon <- ncvar_get(ncin, "longitude")
nlon = dim(lon)
lat <- ncvar_get(ncin, "latitude")
nlat = dim(lat)
ntime <- dim(ncvar_get(ncin, "time"))
variablename <- "fg10"
fg10.units <- ncatt_get(ncin, variablename, "units")
fg10.units
lonlat.unstack <- expand.grid(lon=as.numeric(lon), lat=as.numeric(lat))
#Create ERA5 centers (point with lat, lon, and value, as cell index)
era5colpoints = st_as_sf(lonlat.unstack, coords=1:2, crs=st_crs(4326))
era5colpoints$value = 1:(nlon*nlat)
#Define stars object to match with ERA5 bounding box.
#Cell centers of stars object, need to be same cell centers of ERA5
pointsbbox = st_bbox(era5colpoints)
cellsize = lonlat.unstack$lon[2]- lonlat.unstack$lon[1]
mybbox = st_bbox(c(pointsbbox$xmin - (cellsize/2), pointsbbox$xmax + (cellsize/2), pointsbbox$ymax + (cellsize/2), pointsbbox$ymin - (cellsize/2)),
crs = st_crs(4326))
era5colraster.st = st_rasterize(era5colpoints, st_as_stars(mybbox, nx = nlon, ny = nlat, values = era5colpoints$value))
#Load ERA5 polygon vectors, representing cells of ERA5
file_era5_sf_pol = "./data/era5grid_left_right_pol.shp"
era5_4326_sf_pol = st_read(dsn=file_era5_sf_pol)
pts <- do.call(rbind, st_centroid(st_geometry(era5_4326_sf_pol)))
x = pts[,1]
y = pts[,2]
era5_4326_sf_pol$x = x
era5_4326_sf_pol$y = y
era5_4326_sf_pol_filter_corners = era5_4326_sf_pol %>% filter(DN %in% c(1, 49, 3333, 3381))
era5_4326_sf_pol_filter_corners_left = era5_4326_sf_pol %>% filter(DN %in% c(1, 3333))
era5_4326_sf_pol_filter_corners_right = era5_4326_sf_pol %>% filter(DN %in% c(49, 3381))
#Plot ERA5 Stations
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
big = ggplot(data = world) +
geom_sf(fill= "antiquewhite") +
geom_sf(data=era5colpoints, size=0.1, aes(color = "Stations"), shape=".", show.legend = "point")+
scale_color_manual(values = c("Stations" = "black"), name="ERA5", guide = guide_legend(override.aes = list(fill= c(NA), linetype = c("blank"),
shape = c(".")))) +
geom_sf(data = era5_4326_sf_pol_filter_corners, color = "black", aes(fill="Cells"), size=0.1, alpha=1, show.legend = "polygon") +
scale_fill_manual(values = c("Cells" = NA), name="", guide = guide_legend(override.aes = list(fill = c(NA), shape = c(NA), size=0.1))) +
geom_rect(mapping=aes(xmin=-79.252968100, xmax=-78.247031900, ymin=11.832846362, ymax=12.667153638), color="red", alpha=0, size=0.1) +
geom_rect(mapping=aes(xmin=-67.752968100, xmax=-66.747031900, ymin=11.832846362, ymax=12.667153638), color="red", alpha=0, size=0.1) +
geom_rect(mapping=aes(xmin=-79.258632089, xmax=-78.241367911, ymin=-4.671851259, ymax=-3.828148741), color="red", alpha=0, size=0.1) +
geom_rect(mapping=aes(xmin=-67.758632089, xmax=-66.741367911, ymin=-4.671851259, ymax=-3.828148741), color="red", alpha=0, size=0.1) +
geom_text(data= world_points[venezuela,],aes(x=-66.3, y=8.5, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-79.7, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-79.5, y=-1.5, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75.5, y=-5.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-66, y=-2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_text_repel(data = era5_4326_sf_pol_filter_corners_left, size=2, aes(x=x, y=y, label = DN), direction="y", segment.size=0.1,
segment.color= "grey50", color="grey50", nudge_x=-1, hjust=1, box.padding=0.1) +
geom_text_repel(data = era5_4326_sf_pol_filter_corners_right, size=2, aes(x=x, y=y, label = DN), direction="y", segment.size=0.1,
segment.color= "grey50", color="grey50", nudge_x=+1, hjust=0, box.padding=0.1) +
coord_sf(xlim = c(-81.1, -64.8), ylim = c(-5, 13), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("ERA5 Reanalysis - Forecast") +
theme(plot.title = element_text(size=8)) +
theme(axis.text.x= element_text(size=7)) +
theme(axis.text.y= element_text(size=7)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(legend.title = element_text(size=8)) +
theme(legend.text=element_text(size=8)) +
theme(legend.key.size = unit(0.5,"line")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(axis.text.x = element_text(margin = margin(t =2, b = -10))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -10)))
corner1lt = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_sf(data = era5_4326_sf_pol, colour="black", fill=NA, size=0.1) +
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.5, y=9.2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-79.2, y=-1, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
geom_sf_text(data = era5_4326_sf_pol, aes(label = DN), size=2) +
coord_sf(xlim = c(-79.252968100, -78.247031900), ylim = c(11.832846362, 12.667153638), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("") +
theme(panel.grid = element_blank()) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(plot.margin=grid::unit(c(0,0.2,0,0),"cm")) +
theme(panel.border = element_rect(colour = "red"))+
theme(axis.ticks.length=unit(0, "null")) +
theme(axis.title.x=element_blank()) +
theme(axis.title.y=element_blank()) +
theme(plot.title = element_blank())
corner2rt = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_sf(data = era5_4326_sf_pol, colour="black", fill=NA, size=0.1) +
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.5, y=9.2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-79.2, y=-1, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
geom_sf_text(data = era5_4326_sf_pol, aes(label = DN), size=2) +
coord_sf(xlim = c(-67.752968100, -66.747031900), ylim = c(11.832846362, 12.667153638), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("") +
theme(panel.grid = element_blank()) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(plot.margin=grid::unit(c(0,0,0,0.2),"cm")) +
theme(panel.border = element_rect(colour = "red"))+
theme(axis.ticks.length=unit(0, "null")) +
theme(axis.title.x=element_blank()) +
theme(axis.title.y=element_blank()) +
theme(plot.title = element_blank())
corner3lb = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_sf(data = era5_4326_sf_pol, colour="black", fill=NA, size=0.1) +
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.5, y=9.2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-79.2, y=-1, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
geom_sf_text(data = era5_4326_sf_pol, aes(label = DN), size=2) +
coord_sf(xlim = c(-79.258632089, -78.241367911), ylim = c(-4.671851259, -3.828148741), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("") +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(plot.margin=grid::unit(c(0,0.2,0,0),"cm")) +
theme(panel.border = element_rect(colour = "red"))+
theme(axis.ticks.length=unit(0, "null")) +
theme(axis.title.x=element_blank()) +
theme(axis.title.y=element_blank()) +
theme(plot.title = element_blank())
corner4rb = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_sf(data = era5_4326_sf_pol, colour="black", fill=NA, size=0.1) +
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.5, y=9.2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-79.2, y=-1, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
geom_sf_text(data = era5_4326_sf_pol, aes(label = DN), size=2) +
coord_sf(xlim = c(-67.758632089, -66.741367911), ylim = c(-4.671851259, -3.828148741), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("") +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(plot.margin=grid::unit(c(0,0,0,0.2),"cm")) +
theme(panel.border = element_rect(colour = "red"))+
theme(axis.ticks.length=unit(0, "null")) +
theme(axis.ticks.margin=unit(0, "null")) +
theme(axis.title.x=element_blank()) +
theme(axis.title.y=element_blank()) +
theme(plot.title = element_blank())
grid.arrange(big, arrangeGrob(corner1lt, corner2rt, corner3lb, corner4rb), ncol=2, widths=c(2.2,1))
In Chapter @ref(rmd-thefra) - Theoretical Framework:
# List of packages required for this analysis
pkg <- c("RcmdrMisc")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg))
install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load packages (thesisdown will load all of the packages as well)
library(RcmdrMisc)
library(knitr)
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 300, length.out=1000)
pdfG <- function(x) {
1/location *exp(-(x-location)/scale)*exp(-exp(-(x-location)/scale))
}
.y = pdfG(.x)
plot(.x, .y, col="green", lty=4, xlab="Velocities Km/h", ylab="Density Function - Gumbel Distribution", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7,
main=paste("Gumbel - Density Function Gumbel Distribution\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l", cex.sub=0.6)
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 300, length.out=1000)
dfG = dgumbel(.x, location=location, scale=scale)
plot(.x, dfG, col="red", lty=4, xlab="Velocities Km/h", ylab="Density Function - Gumbel Distribution", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7,
main=paste("Gumbel - Density Function Gumbel Distribution\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l", cex.sub=0.6)
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 300, length.out=1000)
cdfG <- function(x) {
exp(-exp(-(x-location)/scale))
}
.y = cdfG(.x)
plot(.x, .y, col="green", lty=4, xlab="Velocities Km/h", ylab="Probability", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7, type="l",
main=paste("Gumbel - Cumulative Distribution Function\n", "Location=", round(location,2), " Scale=", round(scale,2)), cex.sub=0.6)
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 1, length.out=1000)
ppfG <- function(x) {
location - (scale*log(-log(x)))
}
.y = ppfG(.x)
plot(.x, .y, col="green", lty=4, ylab="Velocities Km/h", xlab="Probability", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7, cex.sub=0.6,
main=paste("Gumbel - Percent Point Function\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l")
par(mar=c(2.5,2.5,2,0))
par(oma=c(0,0,0,0))
par(mgp=c(1.5,0.5,0))
location = 100
scale = 40
.x <- seq(0, 1500, length.out=1000)
hfG <- function(x) {
(1/scale)*(exp(-(x-location)/scale))/(exp(exp(-(x-location)/scale))-1)
}
.y = hfG(.x)
plot(.x, .y, col="green", lty=4, xlab="Velocities Km/h", ylab="Hazard", cex.axis = 0.5, cex.lab= 0.6, cex.main=0.7, cex.sub=0.6,
main=paste("Gumbel - Hazard Function\n", "Location=", round(location,2), " Scale=", round(scale,2)), type="l", xlim=c(0,500))
par(mar=c(3,3,0,0))
par(oma=c(0,0,0,0))
par(mgp=c(2,1,0))
plot(1, type="n", xlab=expression(paste("Compound Probability ", P[n])), ylab="Exposure Time as a Multiple of MRI",
xlim=c(0,1), ylim= c(0,2500), xaxt ="n", yaxt="n", bty="n", cex.lab=0.7)
y1 = c(0, 500,1000,1500,2000,2500)
text(y=y1, x=par("usr")[1], labels = y1/500, srt = 0, pos = 2, xpd = TRUE, cex=0.8)
y1 = 500*0.69
text(y=y1, x=par("usr")[1], labels = ".69", srt = 0, pos = 2, xpd = TRUE, cex=0.6)
y1 = 2250
text(y=y1, x=par("usr")[1], labels = expression(paste("4",frac(1,2))), srt = 0, pos = 2, xpd = TRUE, cex=0.6)
npn <- function(x) (1-(1/500))^x #Event will not occur
n = seq(from=0, to=2500, by=1)
mynpn = npn(n)
lines(x=mynpn,y=n, col= "blue")
pn <- function(x) 1-(1-(1/500))^x #Event will occur
mypn= pn(n)
lines(x=mypn,y=n, col= "green")
text(x=c(0.01, 0.37,0.63, 0.99), par("usr")[3], labels = c(".01",".37",".63",".99") , srt = 0, pos = 1, xpd = TRUE, cex=0.6)
axis(1, at= seq(from=0, to=1, by= 0.1), labels=seq(from=0, to=1, by= 0.1), tick=TRUE, col.axis="black", cex=0.8)
axis(2, at=c(0, 500,1000,1500,2000,2500),labels=FALSE, tick=TRUE, col.axis="black")
axis(2, at=c(345, 2250),labels=FALSE, tick=TRUE, col.axis="black", tck=-0.015)
axis(1, at=c(0.01,.37,.63,0.99),labels=FALSE, tick=TRUE, col.axis="black", tck=-0.015)
abline(v=c(0.01, 0.37,0.5, 0.63,0.99), lty="dotted")
abline(h=c(345,500, 2250), lty="dotted")
text(x=0.15, y=1800, labels = "chance event\nwill not occur", cex=0.7)
text(x=0.85, y=1800, labels = "chance event\nwill occur", cex=0.7)
In Chapter @ref(rmd-method) - Methodology:
plotit<-function(){
par(mar=c(2,2,0,0))
par(oma=c(0,0,0,0))
par(bg=NA)
plot(1, xlab='', ylab='', type='n', yaxt='n', xaxt='n', tck=0, xlim=c(0,200), ylim=c(0,0.05), bg = 'transparent', bty="n")
arrows(0,0,0,0.05, length=0.04)
arrows(0,0,200,0, length=0.04)
text(x = par("usr")[2] - 5, y = par("usr")[3] - 0.005, labels = expression(frac(1, N)), xpd = NA, srt = 0, cex = 0.7)
text(x = par("usr")[1] - 6, y = 0.05, labels = expression(Y[N]), xpd = NA, srt = 0, cex = 0.7)
text(x = 50.2, y = par("usr")[3] - 0.003, labels = "?", xpd = NA, srt = 0, cex = 0.7)
text(x = 68.5, y = par("usr")[3] - 0.003, labels = "0.02", xpd = NA, srt = 0, cex = 0.7)
text(x = 100, y = par("usr")[3] - 0.003, labels = "0.03", xpd = NA, srt = 0, cex = 0.7)
text(x = par("usr")[1] - 10, y = 0.015, labels = expression(paste("30 ", frac(Km, h))), xpd = NA, srt = 0, cex = 0.7)
text(x = par("usr")[2] - 2, y = 0.048, labels = "Combined", xpd = NA, srt = 0, pos = 2, cex = 0.6)
text(x = par("usr")[2] - 2, y = 0.031, labels = "Hurricanes", xpd = NA, pos = 2, srt = 0, cex = 0.6)
text(x = par("usr")[2] - 2, y = 0.021, labels = "Non-Hurricanes", xpd = NA, pos = 2, srt = 0, cex = 0.6)
myexp = expression(paste(P[e], " = 1 - (1 -", P[nh], ") * (1 - ", P[h], ")"))
text(x = par("usr")[1] + 60, y = 0.049, labels = myexp, xpd = NA, srt = 0, cex = 0.7)
text(x = par("usr")[1] + 60, y = 0.045, labels = "? = 1- (1 – 0.03)(1-0.02)", xpd = NA, srt = 0, cex = 0.6)
location = 65
scale = 20
.x <- seq(0, 1500, length.out=1000)
hfG <- function(x) {
(1/scale)*(exp(-(x-location)/scale))/(exp(exp(-(x-location)/scale))-1)
}
curve(hfG, add=T, col="red", lwd=1, lty=5)
Arrows (x0=50.2, y0=0, x1=50.2, y1=(hfG(50.2)-0.003), arr.type="triangle", arr.width=0.04, lwd=0.1)
location = 80
scale = 30
.x <- seq(0, 1500, length.out=1000)
curve(hfG, add=T, col="red", lwd=1)
Arrows (x0=68.5, y0=0, x1=68.5, y1=(hfG(68.5)-0.003), arr.type="triangle", arr.width=0.04, lwd=0.1)
location = 100
scale = 40
.x <- seq(0, 1500, length.out=1000)
curve(hfG, add=T, col="red", lwd=1)
Arrows (x0=100, y0=0, x1=100, y1=(hfG(100)-0.003), arr.type="triangle", arr.width=0.04, lwd=0.1)
Arrows (x0=100, y0=hfG(100), x1=7, y1=hfG(100) , arr.type="triangle", arr.width=0.04, lwd=0.1)
}
z.plot1<-function(){plotit()}
mydataframe = data.frame(v = c(10, 20, 30, "...", 350, "..."), Pe = c("...", "...", "?", "...", "...", "..."))
names(mydataframe) <- c(expression(Y[N]), expression(P[e]))
tt <- ttheme_default(base_size = 7, colhead=list(fg_params = list(parse=TRUE)))
tbl <- tableGrob(mydataframe, rows=NULL, theme=tt)
plot_grid(z.plot1, tbl, ncol = 2, rel_widths = c(4,1), labels=c("", "Combined Curve"), label_size = 7, hjust=-0.13)
In Chapter @ref(rmd-results) - Results and Discussion:
#Downscaling Support - Sources Comparison
#
#Important Note for next line of code (seven lines bellow):
#
#Not run when Knitting. Run it only in RStudio before Knit process, that is why next line of code is out-commented (using #).
#This code generates some graphics in format RDS, used inside Results and Discussion chapter. It is not necessary to run again,
#unless current RDS files already generated, are not usable for changes in R packages versions (possible situation for future)
#Output of this code goes to the folder ‘…/index/data/’ of the ‘thesisdown’ root folder.
#
#source('./code/comparing_sources_pqrs_20199050080932_VV_AUT.r')
#
#
#IDEAM VV_AUT_2 - Quality Data Comparison
#
con1 = src_postgres(dbname = "winddata", host = "localhost", port = 5432, user = "user1", password = "user1")
#Get Ideam Stations Table
originalfields4 = c("objectid", "codigo1", "nombre", "latitud", "longitud", "latitud", "longitud", "categoria")
newnames4 = c("objectid", "codigo1", "nombre", "latitud", "longitud", "y", "x", "categoria")
originalfields4 = paste(originalfields4, " as ", newnames4, sep="")
originalfields4 = paste(originalfields4, collapse= ", ", sep = "")
query4 = paste("select", originalfields4, "from ideam_all_stations", "where codigo1 IN (48015050, 52055230, 26125061, 26125710, 23085270, 27015330,
16015501, 23195502, 13035501, 28025502, 15065180, 29045190)", sep=" ")
ideam_stations = as_tibble(tbl(con1, sql(query4)))
Encoding(ideam_stations$categoria) <- "UTF-8"
Encoding(ideam_stations$nombre) <- "UTF-8"
originalfields3 = c("id", "usaf", "station_name", "latitud", "longitud", "latitud", "longitud")
newnames3 = c("id", "usaf", "station_name", "latitud", "longitud", "y", "x")
originalfields3 = paste(originalfields3, " as ", newnames3, sep="")
originalfields3 = paste(originalfields3, collapse= ", ", sep = "")
query3 = paste("select", originalfields3, "from isd_all_stations where usaf IN ('803980', '803700', '802110', '802100', '801120', '801100',
'800970', '800940', '800630', '800360', '800350', '800280')", sep=" ")
isd_stations = as_tibble(tbl(con1, sql(query3)))
ideam_stations = st_as_sf(ideam_stations, coords = c("longitud", "latitud"), crs = 4326)
isd_stations = st_as_sf(isd_stations, coords = c("longitud", "latitud"), crs = 4326)
ideam_stations3 <- ideam_stations %>% filter(codigo1 %in% c(15065180, 29045190, 28025502))
ideam_stationso <- ideam_stations %>% filter(codigo1 %in% c(48015050, 52055230, 26125061, 26125710, 23085270, 27015330, 16015501, 23195502,
13035501))
isd_stations3 <- isd_stations %>% filter(usaf %in% c('800350', '800280', '800360'))
isd_stationso <- isd_stations %>% filter(usaf %in% c('803980', '803700', '802110', '802100', '801120', '801100', '800970', '800940', '800630'))
file_era5_st = "./data/era5grid_left_right.tif"
era5_4326_st = read_stars(file_era5_st)
era5_4326_st = setNames(era5_4326_st, "Station")
file_col_st = "./data/col2.tif"
col_4326_st = read_stars(file_col_st)
col_4326_st = setNames(col_4326_st, "col_4326_st")
file_era5_sf_point = "./data/era5grid_left_right.shp"
era5_4326_sf_point = st_read(dsn=file_era5_sf_point)
file_era5_sf_pol = "./data/era5grid_left_right_pol.shp"
era5_4326_sf_pol = st_read(dsn=file_era5_sf_pol)
file_col_sf_pol = "./data/COLOMBIA.shp"
col_4326_sf_pol = st_read(dsn=file_col_sf_pol)
img_stack_col=stack(file_col_st)
img_stack_col.crop = crop(img_stack_col, extent(col_4326_sf_pol))
img_stack_col.mask = mask(img_stack_col.crop, col_4326_sf_pol)
myPalette <- colorRampPalette(rev(brewer.pal(11, "YlGn")))
sc <- scale_fill_gradientn(colours = myPalette(100), limits=c(0, 255))
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
k = era5_4326_sf_pol
big <-ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_text(data= world_points[venezuela,],aes(x=-68.5, y=8.5, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.5, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-3.7, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
geom_rect(mapping=aes(xmin=-73.7, xmax=-72.8, ymin=10.1, ymax=10.9), color="red", alpha=0, size=0.1) +
geom_sf(data = ideam_stations, size=2, shape=23, color= "red", show.legend = "point") +
geom_text_repel(data = ideam_stationso, size=2, aes(x=x, y=y, label = codigo1), direction="x", segment.size=0.1) +
geom_text_repel(data = ideam_stations3, size=2, aes(x=x, y=y, label = codigo1), direction="x", segment.size=0.1, nudge_x=-0.5) +
geom_sf(data = isd_stations, size=2, shape=3, color= "blue", show.legend = "point") +
geom_text_repel(data = isd_stationso, size=2, aes(x=x, y=y, label = usaf), direction="y", segment.size=0.1) +
geom_text_repel(data = isd_stations3, size=2, aes(x=x, y=y, label = usaf), direction="x", segment.size=0.1, nudge_x=0.5) +
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.2, "cm"), line_width = 0.5, text_cex = 0.5) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.05, "in"), pad_y = unit(0.05, "in"), height = unit(1, "cm"),
width = unit(1, "cm"), style = north_arrow_fancy_orienteering) +
coord_sf(xlim = c(-79.5, -66.5), ylim = c(-5.4, 12.3), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Quality Data Comparison\nTwelve matching stations from IDEAM and ISD") +
theme(plot.title = element_text(size=8)) +
theme(axis.text.x= element_text(size=7)) +
theme(axis.text.y= element_text(size=7)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(axis.text.x = element_text(margin = margin(t =2, b = -10))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -10)))
small <- ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_sf(data = k, colour="black", fill=NA, size=0.1) +
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.5, y=9.2, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-79.2, y=-1, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=3, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 4) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 4) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
geom_sf(data = ideam_stations, size=3, shape=23, color= "red", show.legend = "point") +
geom_sf_text(data = ideam_stations, aes(label = codigo1), size=2, position = position_nudge(x = -0.11)) +
geom_sf(data = isd_stations, size=3, shape=3, color= "blue", show.legend = "point") +
geom_sf_text(data = isd_stations, aes(label = usaf), size=2, position = position_nudge(x = +0.09)) +
geom_sf_label(data = k, aes(label = DN), size=2) +
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.2, "cm")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.05, "in"), pad_y = unit(0.05, "in"), height = unit(1, "cm"),
width = unit(1, "cm"), style = north_arrow_fancy_orienteering) +
coord_sf(xlim = c(-73.7, -72.8), ylim = c(10.1, 10.9), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Quality Data Comparison\nStations 28025502 from IDEAM, \n800360 from ISD, and 416 from ERA5") +
theme(plot.title = element_text(size=8)) +
theme(axis.text.x= element_text(size=7)) +
theme(axis.text.y= element_text(size=7)) +
theme(panel.border = element_rect(colour = "red"))+
#theme(plot.background = element_rect(color = "black")) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"))
grid.arrange(big, small, ncol=2)
#
#Quality Data Comparison. High similarity between sources
#
dat1 <- readRDS("data/comparison13.rds")
dat1
#
#IDEAM VV_AUT_10 - Non Quality Data Comparison
#
myPalette <- colorRampPalette(rev(brewer.pal(11, "YlGn")))
sc <- scale_fill_gradientn(colours = myPalette(100), limits=c(0, 255))
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
pts <- do.call(rbind, st_centroid(st_geometry(era5_4326_sf_pol)))
x = pts[,1]
y = pts[,2]
era5_4326_sf_pol$x = x
era5_4326_sf_pol$y = y
era5_4326_sf_pol_filter_good = era5_4326_sf_pol %>% filter(DN %in% c(2261, 1971,2066,2020,2260,1875,2213,2637,1442,1583,1501,1582,1381,1493,1485,
1397,1338,1055,511,1644,515,221,1038))
era5_4326_sf_pol_filter_very_good = era5_4326_sf_pol %>% filter(DN %in% c(265,360,78,312,416))
era5_4326_sf_pol_col1 = era5_4326_sf_pol %>% filter(DN %in% c(1, 50, 148, 246, 344, 442, 540, 638, 736, 834, 932, 1030, 1128, 1226, 1324, 1422,
1520, 1618, 1716, 1814, 1912, 2010, 2108, 2206, 2304, 2402, 2500, 2598, 2696, 2794, 2892, 2990, 3088, 3186, 3284, 3333))
era5_4326_sf_pol_col49 = era5_4326_sf_pol %>% filter(DN %in% c(49, 147, 245, 343, 441, 539, 637, 735, 833, 931, 1029, 1127, 1225, 1323, 1421, 1519,
1617, 1715, 1813, 1911, 2009, 2107, 2205, 2303, 2401, 2499, 2597, 2695, 2793, 2891, 2989, 3087, 3185, 3283, 3381))
ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1, alpha=0.7) +
ggRGB(img_stack_col.mask, r = 1, g = 2, b = 3, ggLayer = TRUE, alpha=0.8)+
sc+
geom_sf(data = era5_4326_sf_pol, colour="grey", fill=NA, size=0.1) +
geom_sf(data = era5_4326_sf_pol_filter_good, aes(fill = "Good: 23"), colour="black", size=0.1, alpha=1, show.legend = "polygon") +
geom_sf(data = era5_4326_sf_pol_filter_very_good, aes(fill = "Very Good: 5"), colour="black", size=0.1, alpha=1, show.legend = "polygon") +
scale_fill_manual(values = c("Good: 23" = "yellow", "Very Good: 5" = "orange"), name="Downscaling Support") + #, label=c("dd")) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
geom_text_repel(data = era5_4326_sf_pol_col1, size=2, aes(x=x, y=y, label = DN), direction="y", segment.size=0.1, segment.color= "grey50",
color="grey50", nudge_x=-1, hjust=1, box.padding=0.1) +
geom_text_repel(data = era5_4326_sf_pol_col49, size=2, aes(x=x, y=y, label = DN), direction="y", segment.size=0.1, segment.color= "grey50",
color="grey50", nudge_x=+1, hjust=0, box.padding=0.1) +
coord_sf(xlim = c(-81.25, -64.75), ylim = c(-5, 13), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("ERA5 grid, cells IDs from 1 to 3381 (49 cols, 69 rows)\nISD-IDEAM-ERA5 'poor data' comparison") +
theme(plot.title = element_text(size=8)) +
theme(axis.text.x= element_text(size=7)) +
theme(axis.text.y= element_text(size=7)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(axis.text.x = element_text(margin = margin(t =2, b = -10))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -10)))
#
#Non Quality Data Comparison. Time Series Graphic for 'Very Good' Downscaling Support
#
tl = readRDS(file="./data/eracomparisonideam_188_78_plotxts.rds")
tl
#
#Non Quality Data Comparison: Scatter plots for 'Very Good' Downscaling Support
#
par(mfrow=c(3,2))
par(mar=c(0,0,0,0))
par(oma=c(0,0,0,0))
cl = readRDS(file="./data/eracomparisonideam_129_1_2_265_plotgg.rds")
cr = readRDS(file="./data/eracomparisonideam_195_1_2_312_plotgg.rds")
bl = readRDS(file="./data/eracomparisonideam_188_1_3_78_plotgg.rds")
br = readRDS(file="./data/eracomparisonideam_188_1_2_78_plotgg.rds")
plot_grid(cl, cr, bl, br, ncol = 2, labels=c("A", "B", "C", "D"), label_size = 7, label_colour = "red")
#POT-PP in one ISD Station
#
#Output of this code goes to the folder ‘…/index/data/’ of the ‘thesisdown’ root folder.
source('./code/pp_one_station.r')
#
#Non-Thunderstorm Time Series for ISD station 801120. Left: Raw Data. Right: De-clustered Data
#
plot_grid(myprint1, NULL, myprint5, ncol = 3, rel_widths = c(1,0.26,1))
#
#POT - Thresholding
#
dat <- readRDS("data/myprint6.rds")
dat
#
#Graphic Diagnosis Of Goodness of Fit. Station 801120
#
replayPlot(myprint8)
add_label <- function(xfrac, yfrac, label, pos = 4, ...) {
u <- par("usr")
x <- u[1] + xfrac * (u[2] - u[1])
y <- u[4] - yfrac * (u[4] - u[3])
text(x, y, label, pos = pos, ...)
}
par(xpd = TRUE)
add_label(-0.06, 0.01, "A", cex=0.6, col="red")
add_label(0.47, 0.01, "B", cex=0.6, col="red")
add_label(-0.06, 0.54, "C", cex=0.6, col="red")
add_label(0.47, 0.54, "D", cex=0.6, col="red")
#
#Hazard Curve. Station 801120
#
x= 1/paP
y= yvels
df = data.frame(x = x, y = y)
x1= tipicalReturnPeriods
y1= veocitiesfortypicalreturnperiodsP
df2 = data.frame(x = x1, y = y1)
df2_more1700 <- df2 %>% filter(x >= c(1700))
df2_less1700 <- df2 %>% filter(x < c(1700))
ggplot(data=df, aes(x=x, y=y, group=1)) +
geom_line(color="red")+
geom_point(data=df2, aes(x=x, y=y, group=1), shape = 18, color = "black", fill="lightgray") +
xlim(0,8500) +
ylim(125, 300) +
geom_text_repel(data=df2_more1700, size=2, aes(x=x, y=y, label = paste0("(",x,",",round(y, digits=1),")")), direction="y", segment.size=0.1,
nudge_y=15) +
geom_text_repel(data=df2_less1700, size=2, aes(x=x, y=y, label = paste0("(",x,",",round(y, digits=1),")")), direction="x", segment.size=0.1,
nudge_x=1) +
xlab("Return Periods (Years) - POT-PP Intensity Function") +
ylab("Velocities Km/h") +
ggtitle(paste("Declustered - Non-Thunderstorms - Hazard Curve", "\n", "Location=", round(z6,2), "Scale=", round(z7,2))) +
theme(plot.title = element_text(size=8)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(axis.title =element_text(size=7))
#
#Hurricane Wind Maps
#
file_rl_h_4326_700_st = "./data/hurricanemaps/h_700.tif"
rl_h_4326_700_st = read_stars(file_rl_h_4326_700_st)
rl_h_4326_700_st = setNames(rl_h_4326_700_st, "Kph")
file_rl_h_4326_1700_st = "./data/hurricanemaps/h_1700.tif"
rl_h_4326_1700_st = read_stars(file_rl_h_4326_1700_st)
rl_h_4326_1700_st = setNames(rl_h_4326_1700_st, "h_1700")
file_rl_h_4326_3000_st = "./data/hurricanemaps/h_3000.tif"
rl_h_4326_3000_st = read_stars(file_rl_h_4326_3000_st)
rl_h_4326_3000_st = setNames(rl_h_4326_3000_st, "h_3000")
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_fill_gradientn(colours = myPalette(100), limits=c(27, 438), breaks=c(27, 100, 200, 300, 400, 438))
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
c700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_h_4326_700_st, aes(fill = Kph, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-69.4, y=8.5, label=name), color="darkblue", fontface="bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-79.1, y=9.2, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=5.5, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 13.4, label = "Caribbean Sea", fontface = "italic", color = "grey22", size = 1.8) +
annotate(geom = "text", x = -79, y = 5.6, label = "Pacific Sea", fontface = "italic", color = "grey22", size = 1.8) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1) +
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-80.5, -67.7), ylim = c(4.5, 13.9), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane Wind Map. \nMRI-700 years - Ingeniar Ltda") +
theme(plot.title = element_text(size=6)) +
theme(axis.text.x= element_text(size=5)) +
theme(axis.text.y= element_text(size=5)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -10))) +
theme(legend.title = element_text(size = 6)) +
theme(legend.text = element_text(size = 5)) +
theme(legend.background = element_blank()) +
theme(legend.key.width = unit(0.1, "cm")) +
theme(legend.key.height = unit(0.5, "cm"))
legend <- get_legend(c700)
c700 = c700 + theme(legend.position = "none")
c1700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_h_4326_1700_st, aes(fill = h_1700, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-69.4, y=8.5, label=name), color="darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-79.1, y=9.2, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=5.5, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 13.4, label = "Caribbean Sea", fontface = "italic", color = "grey22", size = 1.8) +
annotate(geom = "text", x = -79, y = 5.6, label = "Pacific Sea", fontface = "italic", color = "grey22", size = 1.8) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-80.5, -67.7), ylim = c(4.5, 13.9), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane Wind Map. \nMRI-1700 years - Ingeniar Ltda") +
theme(plot.title = element_text(size=6)) +
theme(axis.text.x= element_text(size=5)) +
theme(axis.text.y= element_text(size=5)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -10))) +
theme(legend.position = "none")
c3000 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_h_4326_3000_st, aes(fill = h_3000, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-69.4, y=8.5, label=name), color="darkblue", fontface="bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[panama,],aes(x=-79.1, y=9.2, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=5.5, label=name), color = "darkblue", fontface = "bold", size=1.5, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 13.4, label = "Caribbean Sea", fontface = "italic", color = "grey22", size = 1.8) +
annotate(geom = "text", x = -79, y = 5.6, label = "Pacific Sea", fontface = "italic", color = "grey22", size = 1.8) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1) +
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-80.5, -67.7), ylim = c(4.5, 13.9), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane Wind Map. \nMRI-3000 years - Ingeniar Ltda") +
theme(plot.title = element_text(size=6)) +
theme(axis.text.x= element_text(size=5)) +
theme(axis.text.y= element_text(size=5)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -10))) +
theme(legend.position = "none")
plot_grid(c700, c1700, c3000, legend, ncol=4, rel_widths=c(1, 1, 1, 0.13))
#
#ISD Non-Hurricane Wind Maps
#
file_rl_nh_4326_700_st = "./data/isdmaps/nh_700.tif"
rl_nh_4326_700_st = read_stars(file_rl_nh_4326_700_st)
rl_nh_4326_700_st = setNames(rl_nh_4326_700_st, "Kph")
file_rl_nh_4326_1700_st = "./data/isdmaps/nh_1700.tif"
rl_nh_4326_1700_st = read_stars(file_rl_nh_4326_1700_st)
rl_nh_4326_1700_st = setNames(rl_nh_4326_1700_st, "nh_1700")
file_rl_nh_4326_3000_st = "./data/isdmaps/nh_3000.tif"
rl_nh_4326_3000_st = read_stars(file_rl_nh_4326_3000_st)
rl_nh_4326_3000_st = setNames(rl_nh_4326_3000_st, "nh_3000")
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_fill_gradientn(colours = myPalette(100), limits=c(188, 346),
breaks=c(188, 200, 225, 250, 275, 300, 325, 346))
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
c700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_nh_4326_700_st, aes(fill = Kph, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Non-Hurricane Wind Map. \nMRI-700 years - ISD") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.title = element_text(size = 6)) +
theme(legend.text = element_text(size = 6)) +
theme(legend.background = element_blank()) +
theme(legend.key.width = unit(0.1, "cm"))
legend <- get_legend(c700)
c700 = c700 + theme(legend.position = "none")
c1700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_nh_4326_1700_st, aes(fill = nh_1700, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Non-Hurricane Wind Map. \nMRI-1700 years - ISD") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
c3000 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_nh_4326_3000_st, aes(fill = nh_3000, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1) +
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Non-Hurricane Wind Map. \nMRI-3000 years - ISD") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
plot_grid(c700, c1700, c3000, legend, ncol=4, rel_widths=c(1, 1, 1, 0.1))
#
#ERA5 Non-Hurricane Wind Maps
#
file_rl_nonhurricanes_4326_700_st = "./data/era5maps/rl_nonhurricanes_4326_700_st.tif"
rl_nonhurricanes_4326_700_st = read_stars(file_rl_nonhurricanes_4326_700_st)
rl_nonhurricanes_4326_700_st = setNames(rl_nonhurricanes_4326_700_st, "Kph")
file_rl_nonhurricanes_4326_1700_st = "./data/era5maps/rl_nonhurricanes_4326_1700_st.tif"
rl_nonhurricanes_4326_1700_st = read_stars(file_rl_nonhurricanes_4326_1700_st)
rl_nonhurricanes_4326_1700_st = setNames(rl_nonhurricanes_4326_1700_st, "nh_1700")
file_rl_nonhurricanes_4326_3000_st = "./data/era5maps/rl_nonhurricanes_4326_3000_st.tif"
rl_nonhurricanes_4326_3000_st = read_stars(file_rl_nonhurricanes_4326_3000_st)
rl_nonhurricanes_4326_3000_st = setNames(rl_nonhurricanes_4326_3000_st, "nh_3000")
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_fill_gradientn(colours = myPalette(100), limits=c(36, 129), breaks=c(36, 50, 75, 100, 120, 129))
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
c700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_nonhurricanes_4326_700_st, aes(fill = Kph, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Non-Hurricane Wind Map. \nMRI-700 years - ERA5") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.title = element_text(size = 6)) +
theme(legend.text = element_text(size = 6)) +
theme(legend.background = element_blank()) +
theme(legend.key.width = unit(0.1, "cm"))
legend <- get_legend(c700)
c700 = c700 + theme(legend.position = "none")
c1700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_nonhurricanes_4326_1700_st, aes(fill = nh_1700, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Non-Hurricane Wind Map. \nMRI-1700 years - ERA5") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
c3000 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_nonhurricanes_4326_3000_st, aes(fill = nh_3000, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Non-Hurricane Wind Map. \nMRI-3000 years - ERA5") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
plot_grid(c700, c1700, c3000, legend, ncol=4, rel_widths=c(1, 1, 1, 0.1))
#
#ISD Hurricane & Non-Hurricane Wind Maps
#
file_rl_combined_4326_700_st = "./data/isdmaps/isd_combined_4326_700_st.tif"
rl_combined_4326_700_st = read_stars(file_rl_combined_4326_700_st)
rl_combined_4326_700_st = setNames(rl_combined_4326_700_st, "Kph")
file_rl_combined_4326_1700_st = "./data/isdmaps/isd_combined_4326_1700_st.tif"
rl_combined_4326_1700_st = read_stars(file_rl_combined_4326_1700_st)
rl_combined_4326_1700_st = setNames(rl_combined_4326_1700_st, "c_1700")
file_rl_combined_4326_3000_st = "./data/isdmaps/isd_combined_4326_3000_st.tif"
rl_combined_4326_3000_st = read_stars(file_rl_combined_4326_3000_st)
rl_combined_4326_3000_st = setNames(rl_combined_4326_3000_st, "c_3000")
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_fill_gradientn(colours = myPalette(100), limits=c(188, 438), breaks=c(188, 200, 250, 300, 350, 400, 438))
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
c700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_combined_4326_700_st, aes(fill = Kph, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane & Non-Hurricane Wind Map. \nMRI-700 years - ISD") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.title = element_text(size = 6)) +
theme(legend.text = element_text(size = 6)) +
theme(legend.background = element_blank()) +
theme(legend.key.width = unit(0.1, "cm"))
legend <- get_legend(c700)
c700 = c700 + theme(legend.position = "none")
c1700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_combined_4326_1700_st, aes(fill = c_1700, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane & Non-Hurricane Wind Map. \nMRI-1700 years - ISD") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
c3000 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_combined_4326_3000_st, aes(fill = c_3000, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane & Non-Hurricane Wind Map. \nMRI-3000 years - ISD") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
plot_grid(c700, c1700, c3000, legend, ncol=4, rel_widths=c(1, 1, 1, 0.1))
#
#ERA5 Hurricane & Non-Hurricane Wind Maps
#
file_rl_combined_4326_700_st = "./data/era5maps/rl_combined_4326_700_st.tif"
rl_combined_4326_700_st = read_stars(file_rl_combined_4326_700_st)
rl_combined_4326_700_st = setNames(rl_combined_4326_700_st, "Kph")
file_rl_combined_4326_1700_st = "./data/era5maps/rl_combined_4326_1700_st.tif"
rl_combined_4326_1700_st = read_stars(file_rl_combined_4326_1700_st)
rl_combined_4326_1700_st = setNames(rl_combined_4326_1700_st, "c_1700")
file_rl_combined_4326_3000_st = "./data/era5maps/rl_combined_4326_3000_st.tif"
rl_combined_4326_3000_st = read_stars(file_rl_combined_4326_3000_st)
rl_combined_4326_3000_st = setNames(rl_combined_4326_3000_st, "c_3000")
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_fill_gradientn(colours = myPalette(100), limits=c(37, 438), breaks=c(37, 100, 200, 300, 400, 438))
theme_set(theme_bw())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))
colombia = world_points$name == "Colombia"
panama = world_points$name == "Panama"
peru= world_points$name == "Peru"
brazil= world_points$name == "Brazil"
venezuela= world_points$name == "Venezuela"
ecuador= world_points$name == "Ecuador"
c700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_combined_4326_700_st, aes(fill = Kph, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane & Non-Hurricane Wind Map. \nMRI-700 years - ERA5") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.title = element_text(size = 6)) +
theme(legend.text = element_text(size = 6)) +
theme(legend.background = element_blank()) +
theme(legend.key.width = unit(0.1, "cm"))
legend <- get_legend(c700)
c700 = c700 + theme(legend.position = "none")
c1700 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_combined_4326_1700_st, aes(fill = c_1700, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane & Non-Hurricane Wind Map. \nMRI-1700 years - ERA5") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
c3000 = ggplot(data = world) +
geom_sf(fill= "antiquewhite", size=0.1) +
geom_stars(data = rl_combined_4326_3000_st, aes(fill = c_3000, x = x, y = y)) +
sc+
geom_text(data= world_points[venezuela,],aes(x=-66.5, y=8.5, label=name), color="darkblue", fontface="bold", size=2, check_overlap=FALSE) +
geom_text(data= world_points[panama,],aes(x=-80.3, y=9.2, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[ecuador,],aes(x=-78.8, y=-1, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[peru,],aes(x=-75, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[brazil,],aes(x=-68, y=-6, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
geom_text(data= world_points[colombia,],aes(x=-71, y=4, label=name), color = "darkblue", fontface = "bold", size=2, check_overlap = FALSE) +
annotate(geom = "text", x = -77.5, y = 14, label = "Caribbean\nSea", fontface = "italic", color = "grey22", size = 2) +
annotate(geom = "text", x = -80.5, y = 5, label = "Pacific\nSea", fontface = "italic", color = "grey22", size = 2) +
geom_sf(data = st_cast(world, "MULTILINESTRING"), size=0.1)+
annotation_scale(location = "bl", width_hint = 0.5, height = unit(0.1, "cm"), line_width = 0.1, text_cex = 0.5, pad_x = unit(0.03, "in"),
pad_y = unit(0.03, "in")) +
annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.01, "in"), pad_y = unit(0.02, "in"), height = unit(0.5, "cm"),
width = unit(0.5, "cm"), style = north_arrow_minimal) +
coord_sf(xlim = c(-82.1, -63.8), ylim = c(-7.5, 15.5), expand = FALSE) +
xlab("") +
ylab("") +
ggtitle("Hurricane & Non-Hurricane Wind Map. \nMRI-3000 years - ERA5") +
theme(plot.title = element_text(size=7)) +
theme(axis.text.x= element_text(size=6)) +
theme(axis.text.y= element_text(size=6)) +
theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.1)) +
theme(panel.background = element_rect(fill = "aliceblue")) +
theme(plot.margin=unit(c(0,0,0,0),"cm")) +
theme(plot.background = element_rect(fill = "transparent", color = NA))+
theme(axis.text.x = element_text(margin = margin(t =2, b = -14))) +
theme(axis.text.y = element_text(margin = margin(r =2, l = -16))) +
theme(legend.position = "none")
plot_grid(c700, c1700, c3000, legend, ncol=4, rel_widths=c(1, 1, 1, 0.1))
Software requirements:
# List of packages required for this analysis
pkg <- c("actuar", "bbmle", "bookdown", "cowplot", "devtools", "DiagrammeR", "dplyr", "evd", "evir", "evmix", "extRemes", "extremeStat",
"fitdistrplus", "geoR", "ggmap", "ggplot2", "ggrepel", "ggspatial", "grid", "gridExtra", "gstat", "ismev", "kableExtra", "knitr", "lattice",
"lmom", "lmomco", "lubridate", "magick", "maptools", "mapview", "MASS", "ncdf4", "openair", "plot3D", "plotly", "POT", "quantmod", "raster",
"RcmdrMisc", "RColorBrewer", "Renext", "rgdal", "rgl", "rnaturaleearth", "rnaturaleearthdata", "RPostgreSQL", "RStoolbox", "sf", "shape",
"sp", "SpatialExtremes", "stars", "thesisdown", "tibble", "tidyr", "viridis", "xls", "xlsx", "xts")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg))
install.packages(new.pkg, repos = "http://cran.rstudio.com")
#Load packages
library(actuar)
library(bbmle)
library(bookdown)
library(cowplot)
library(devtools)
library(DiagrammeR)
library(dplyr)
library(evd)
library(evir)
library(evmix)
library(extRemes)
library(extremeStat)
library(fitdistrplus)
library(geoR)
library(ggmap)
library(ggplot2)
library(ggrepel)
library(ggspatial)
library(grid)
library(gridExtra)
library(gstat)
library(ismev)
library(kableExtra)
library(knitr)
library(lattice)
library(lmom)
library(lmomco)
library(lubridate)
library(magick)
library(maptools)
library(mapview)
library(MASS)
library(ncdf4)
library(openair)
library(plot3D)
library(plotly)
library(POT)
library(quantmod)
library(raster)
library(RcmdrMisc)
library(RColorBrewer)
library(Renext)
library(rgdal)
library(rgl)
library(rnaturaleearth)
library(rnaturaleearthdata)
library(RPostgreSQL)
library(RStoolbox)
library(sf)
library(shape)
library(sp)
library(SpatialExtremes)
library(stars)
library(thesisdown)
library(tibble)
library(tidyr)
library(viridis)
library(xls)
library(xlsx)
library(xts
After calculation of correction factors for each station (IDEAM or ISD), see Data Standardization in Methodology section, procedure to modify original wind velocity values using R code, can be done by different ways.
There is one text file for each station time series in IDEAM variables VV_AUT_2 (instantaneous wind velocity each two seconds), and VV_AUT_10 (instantaneous wind velocity each ten seconds), see @ref(tab:tabledatasources2). There are twelve files corresponding to twelve different stations. File names follow the format VV_AUT_2@*.data, where * is replaced by the station identifier, as can be seen below. Same format applies to files in variable VV_AUT_10.
VV_AUT_2@48015050.data
VV_AUT_2@52055230.data
VV_AUT_2@26125061.data
...
Below is the content of a time series file (VV_AUT_2@48015050.data). It has two columns (Fecha and Valor) separated by character |.
Fecha|Valor
2019-01-04 00:00:00|0.2
2019-01-04 00:02:00|0.4
2019-01-04 00:04:00|0.4
...
Next code, from IDEAM variable VV_AUT_2 (wind velocity each two seconds), read all twelve time series text files from folder ./data/manual/VV_AUT_2/, calculate hourly mean, apply a 3-s gust correction factor of 1.52, and roughness corrections factors stored in the column fc_zo of stationssample data-frame, nonetheless, this code is not efficient in terms of memory management, because it loads all times series to memory, using the object ldf, and load all standardized time series to memory, using object ldf_hourlymean.
path_vv_aut_2 = "./data/manual/VV_AUT_2/"
filenames = list.files(path_vv_aut_2, pattern = "VV_AUT_2@")
#Be aware that ldf is a list of dataframes and you need to do [[]], to go inside it
#Create a list dataframes in ldf
#To access each station time series, you need to use ldf[[integer]]
ldf = lapply(filenames, function(x) {
dat = read.table(paste0(path_vv_aut_2, x), header=TRUE, sep="|", stringsAsFactors=FALSE)
# Add column names
names(dat) = c('fecha', 'valor')
#Take control of possible bad original values
dat$valor[dat$valor > 300] = NA
dat$valor[dat$valor < 1] = NA
#Get station id from file name
dat$station_id = substring(x,10,nchar(x)-5)
#Add datetime object
dat$mydatetime = as.POSIXct(dat$fecha,format="%Y-%m-%d %H:%M:%S", tz="UTC")
return(dat)})
#Stations IDs and Roughness corrections factors
stationssample = data.frame(
ideam_id = as.character(
c(48015050, 52055230, 26125061, 26125710, 23085270, 27015330,
16015501, 23195502, 13035501, 28025502, 15065180, 29045190)),
fc_zo =
c(1.197230052, 1.219771719, 1.102205474 , 1.18154867, 1.113341504, 1.29241596,
1.102205474, 1.177562503, 1.114968586, 1.184849704, 1.5, 1.224744381),
stringsAsFactors=FALSE)
#Calculate hourly mean from original time series, and apply correction factors
ldf_hourlymean <- NULL
for(station in ldf){
library(xts)
library(dplyr)
select <- dplyr::select
#Create XTS object removing NA values
myxts = na.omit(xts(x=select(station, valor), order.by = station$mydatetime))
#Conversion to 3-s gust using Durst curve
#Gust factor from Durst curve equal to 1.52
myxts$valor = myxts$valor * 1.52
#Correction factor by Roughness
fczo = stationssample$fc_zo[stationssample$ideam_id == station$station_id[1]]
myxts$valor = myxts$valor * fczo
#Name of time series is the string "X" + Station ID
colnames(myxts) = paste0("X", station$station_id[1])
#Extract index values of last observation of each hour
endhour = endpoints(myxts,on="hours")
#Calculate hourly mean
myxtshour = xts::period.apply(myxts, INDEX=endhour, FUN=mean)
#Rounding time series to previous hour
index(myxtshour)=trunc(index(myxtshour),"hours")
#Store all standardized time series in ldf_hourlymean
ldf_hourlymean <- cbind(ldf_hourlymean, myxtshour)}
Next code, from ISD data source stored in unstacked PostgreSQL table isd_lite_unstack, twelve stations are loaded (using WHERECLAUSE to filter), and roughness correction factor is applied. Listed ISD stations, are equivalent to previous IDEAM stations, reason why correction factor values are identical. An efficient use of memory is made here due to the use of lazy tibble data-frames from PostgreSQl database, see Annex @ref(dbstoring) Database Storing.
#Stations IDs and Roughness corrections factors
stationssample = data.frame(
isd_usaf_id= as.character(
c(803980, 803700, 802110, 802100, 801120, 801100, 800970, 800940, 800630, 800360, 800350, 800280)),
fc_zo =
c(1.197230052, 1.219771719, 1.102205474 , 1.18154867, 1.113341504, 1.29241596,
1.102205474, 1.177562503, 1.114968586, 1.184849704, 1.5, 1.224744381),
stringsAsFactors=FALSE)
#List of field to read from table, formated with "X" character as column prefix name,
#and double quotes
originalfields1 = stationssample$isd_usaf_id
newfields1 = paste ("X", originalfields1, sep="")
originalfields1 = paste('"', originalfields1, '"', sep = "")
newfields1 = paste('"', newfields1, '"', sep = "")
#Roughness correction factor
fczo = stationssample$fc_zo
#Apply roughness correction factor, and
#force to NULL all values below one
fiedls_query1 = paste("CASE WHEN", originalfields1, "< 1", "THEN NULL ELSE", originalfields1,
"*", fczo, "END AS", newfields1, sep = " ")
#Join datatime field "mydatetime" to the query
fiedls_query1 = c(paste('"', "mydatetime", '"', sep = ""), fiedls_query1)
fiedls_query1 = paste (fiedls_query1, "", sep= "", collapse=", ")
#Construct WHERECLAUSE
wherestring1 = stationssample$isd_usaf_id
wherestring1 = paste('"', wherestring1, '"', sep = "")
#Only take values greater than one and not null
wherestring1 = paste(wherestring1, ">= 1 AND", wherestring1, "IS NOT NULL", sep = " ")
wherestring1 = paste(wherestring1, collapse = ") OR (" , sep = " ")
wherestring1 = paste("(", wherestring1, ")", sep ="")
#Final query
query1 = paste("select", fiedls_query1, "from isd_lite_unstack", "where", wherestring1, sep=" ")
#View content of query1, but first split SQL command to show inside PDF document
mycat <- function(text){
text2 = gsub(pattern = "CASE", replacement = "\n CASE", x = text)
text3 = gsub(pattern = "from", replacement = "\n from", x = text2)
cat(gsub(pattern = "OR", replacement = "\n OR", x = text3))}
mycat(query1)
## select "mydatetime",
## CASE WHEN "803980" < 1 THEN NULL ELSE "803980" * 1.197230052 END AS "X803980",
## CASE WHEN "803700" < 1 THEN NULL ELSE "803700" * 1.219771719 END AS "X803700",
## CASE WHEN "802110" < 1 THEN NULL ELSE "802110" * 1.102205474 END AS "X802110",
## CASE WHEN "802100" < 1 THEN NULL ELSE "802100" * 1.18154867 END AS "X802100",
## CASE WHEN "801120" < 1 THEN NULL ELSE "801120" * 1.113341504 END AS "X801120",
## CASE WHEN "801100" < 1 THEN NULL ELSE "801100" * 1.29241596 END AS "X801100",
## CASE WHEN "800970" < 1 THEN NULL ELSE "800970" * 1.102205474 END AS "X800970",
## CASE WHEN "800940" < 1 THEN NULL ELSE "800940" * 1.177562503 END AS "X800940",
## CASE WHEN "800630" < 1 THEN NULL ELSE "800630" * 1.114968586 END AS "X800630",
## CASE WHEN "800360" < 1 THEN NULL ELSE "800360" * 1.184849704 END AS "X800360",
## CASE WHEN "800350" < 1 THEN NULL ELSE "800350" * 1.5 END AS "X800350",
## CASE WHEN "800280" < 1 THEN NULL ELSE "800280" * 1.224744381 END AS "X800280"
## from isd_lite_unstack where ("803980" >= 1 AND "803980" IS NOT NULL)
## OR ("803700" >= 1 AND "803700" IS NOT NULL)
## OR ("802110" >= 1 AND "802110" IS NOT NULL)
## OR ("802100" >= 1 AND "802100" IS NOT NULL)
## OR ("801120" >= 1 AND "801120" IS NOT NULL)
## OR ("801100" >= 1 AND "801100" IS NOT NULL)
## OR ("800970" >= 1 AND "800970" IS NOT NULL)
## OR ("800940" >= 1 AND "800940" IS NOT NULL)
## OR ("800630" >= 1 AND "800630" IS NOT NULL)
## OR ("800360" >= 1 AND "800360" IS NOT NULL)
## OR ("800350" >= 1 AND "800350" IS NOT NULL)
## OR ("800280" >= 1 AND "800280" IS NOT NULL)
#Connect to Database
library('RPostgreSQL')
## Loading required package: DBI
pg = dbDriver("PostgreSQL")
con1 = dbConnect(pg, user="user1", password="user1", host="localhost", port=5432, dbname="winddata")
#Create tibble dataframe with applied correction factors
isdlite = tbl(con1, sql(query1))
class(isdlite)
## [1] "tbl_PostgreSQLConnection" "tbl_dbi"
## [3] "tbl_sql" "tbl_lazy"
## [5] "tbl"
select(isdlite, paste0("X", stationssample$isd_usaf_id[6:12]))
## # Source: lazy query [?? x 7]
## # Database: postgres 10.0.5 [user1@localhost:5432/winddata]
## X801100 X800970 X800940 X800630 X800360 X800350 X800280
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NA NA NA NA NA NA 1.22
## 2 NA NA NA NA NA NA 6.25
## 3 NA NA NA NA NA NA 6.25
## 4 NA NA NA NA NA NA 2.57
## 5 NA NA NA NA NA NA 3.80
## 6 NA NA NA NA NA NA 2.57
## 7 NA NA NA NA NA NA 3.80
## 8 NA NA NA NA NA NA 1.22
## 9 NA NA NA NA NA NA 5.02
## 10 NA NA NA NA NA NA 1.22
## # ... with more rows
Finally, recommended option to apply correction factors is shown using variable VV_AUT_10 (instantaneous wind velocity each ten seconds) from IDEAM data-source . It is based in an Excel file with all correction factors by each station, and time series in text files. In column total correction factor \(F_{total}\), next Table @ref(tab:cf), integrates correction factors due to surface roughness \(F_{e}\), and gust \(F_{gust}\). As it is described in Data Standardization, (???) is applied to calculate \(F_e\), using roughness \(Z_o\), gradient height \(Z_g\), empirical exponent \(\alpha\), and exposure coefficient \(K_z\). For \(F_{gust}\), Durst curve is used, see Averaging Time 3-s Gust. Despite it is not using lazy tibble data-frames in PostgreSQL, next code uses efficiently memory resources, as it loads only one file time series to memory in a serial iterative process.
library(xlsx)
#Read file with correction factors
fc <-read.xlsx(file="./data/manual/correction_factors_isd_ideam.xlsx", sheetName="fc", header=TRUE,
stringsAsFactors = F)
#In next folder all text files time series from variable VV_AUT_10 are stored
path_vv_aut_10 = "./data/manual/VV_AUT_10/"
#Read files inside previous folder, with pattern VV_AUT_10@ in the name
filenames = list.files(path_vv_aut_10, pattern = "VV_AUT_10@")
#Variable to hold station IDs without roughness correction factor
mymissing = NULL
#Do a loop for each 'filename' inside 'filenames' variable
for(filename in filenames){
#Read current filename from disk into memory, variable 'dat'
dat = read.table(paste0(path_vv_aut_10, filename), header=TRUE, sep="|", stringsAsFactors=FALSE)
# Modify column names
names(dat) = c('fecha', 'valor')
#Take control of possible bad original values
dat$valor[dat$valor > 300] = NA
dat$valor[dat$valor < 1] = NA
#Get station id from current file name
station_id = substring(filename, 11, nchar(filename)-5) #this is for vv_aut_2
#Add datetime object
dat$mydatetime = as.POSIXct(dat$fecha,format="%Y-%m-%d %H:%M:%S", tz="UTC")
library(xts)
library(dplyr)
select <- dplyr::select
#Create XTS object removing NA values
myxts = na.omit(xts(x=select(dat, valor), order.by = dat$mydatetime))
#Get correction factor by Roughness
fexpo = fc$fexpo[fc$station_id == as.integer(station_id)]
#Keep the record of stations without correction factors in Excel file
if (length(fexpo) == 0) {
mymissing = c(mymissing, station$station_id[1])
cat(paste("Station ", station$station_id[1], "not found in correction factors file", "\n"))}
#Get correction factor by gust
#All stations are reported in Excel file for gust correction factor
fgust = fc$fgust[fc$station_id == as.integer(station$station_id[1])]
#Apply correction factors
myxts$valor = myxts$valor * fexpo * fgust
#Name of time series is the string "X" + Station ID
colnames(myxts) = paste0("X", station$station_id[1])
#Extract index values of last observation of each hour
endhour = endpoints(myxts,on="hours")
#Calculate hourly mean
myxtshour = xts::period.apply(myxts, INDEX=endhour, FUN=mean)
#Rounding time series to previous hour
index(myxtshour)=trunc(index(myxtshour),"hours")
#Hourly corrected time series are ready!
#Proceed to apply next step in methodology: POT-PP
#...}
Next Table shows specific code used to compare all data sources. Complete R code report can be found in Annex @ref(rcode) - Research R Code - Digital Files, Table .
Procedure to run data comparison using VV_AUT_2 IDEAM variable:
For this procedure, needed R files are VV_AUT_2_1.r, VV_AUT_2_2.r, and VV_AUT_2_3.r, see Table .
Install R version 3.6.2, RStudio Version 1.2.5033, and 60 R packages dependencies (see chunk of code at beginning of this manual)
Verify files and variables according to descriptions and recommendations of following list (from 1 to 5), then execute file VV_AUT_2_1.r.
Files to run quality data comparison are inside the folder …/downscaling/qualitydata/. Main file to run this process is VV_AUT_2_1.r, and inside it, next list of variables and code need to be configured.
stationssample = data.frame(
isd_usaf_id= as.character(
c(803980, 803700, 802110, 802100, 801120, 801100,
800970, 800940, 800630, 800360, 800350, 800280)),
ideam_id = as.character(
c(48015050, 52055230, 26125061, 26125710, 23085270, 27015330,
16015501, 23195502, 13035501, 28025502, 15065180, 29045190)),
fc_zo =
c(1.197230052, 1.219771719, 1.102205474 , 1.18154867, 1.113341504, 1.29241596,
1.102205474, 1.177562503, 1.114968586, 1.184849704, 1.5, 1.224744381),
stringsAsFactors=FALSE)
Inside file VV_AUT_2_1.r, all needed correction factors are hard coded in variable stationssample. If a more suitable calculation of correction factors is done, those values can be updated inside this data-frame.
path_vv_aut_2 = "./data/manual/VV_AUT_2/"
Inside folder path_vv_aut_2, non-standardized time series text files must be stored (one file per station history). Be aware that all wind velocities must not be standardized. Twelve files are inside this folder, see complete list below.
VV_AUT_2@13035501.data
VV_AUT_2@15065180.data
VV_AUT_2@16015501.data
VV_AUT_2@23085270.data
VV_AUT_2@23195502.data
VV_AUT_2@26125061.data
VV_AUT_2@26125710.data
VV_AUT_2@27015330.data
VV_AUT_2@28025502.data
VV_AUT_2@29045190.data
VV_AUT_2@48015050.data
VV_AUT_2@52055230.data
Files names must follow this convention VV_AUT_2@*.data, where * is the IDEAM station ID. Content of each file must follow structure shown below.
Fecha|Valor
2015-05-30 17:20:00|0.2
2015-05-30 17:22:00|0.1
2015-05-30 17:26:00|0.1
2015-05-30 17:28:00|0.2
2015-05-30 17:30:00|0.4
2015-05-30 17:32:00|0.2
2015-05-30 17:34:00|0.2
...
First line must always have next content Fecha|Valor, representing two column names, first column Fecha, has the date time in the format “YYYY-MM-DD HH:MM:SS”, time zone UTC, and second column Valor, must have the non-standardized wind velocity in kilometers per hour. Separation of each column value must be | character.
con1 = dbConnect(pg, user="user1", password="user1", host="localhost", port=5432, dbname="winddata")
In variable con1 all PostgreSQL database credentials are defined. Tables to use are isd_all_stations, and ideam_all_stations, corresponding to ISD and IDEAM stations catalogs, and isd_lite_unstack corresponding to ISD unstacked time series. Verify values in con1 according to current database configuration.
source('VV_AUT_2_2.r')
...
source('VV_AUT_2_3.r')
Inside files VV_AUT_2_2.r and VV_AUT_2_3.r all code to read ERA5 dataset is defined. The most important line there, defines the location of NetCDF file outfile_nc4c_zip9.nc with variable fg10 (3-s wind gust). Be sure that ncname (shown below) variable is pointing to the right place where NetCDF file is stored.
(ncname <- "../data/outfile_nc4c_zip9")
somePDFPath = paste(paste("isdideamera5", i, statideam, sep = "_"), "pdf", sep=".")
Variable somePDFPath holds the name of the PDF file with the quality data comparison between IDEAM, ISD and ERA5. After execution, there will be a file for each station provided in data frame stationssample. Inside each PDF file, comparison time series graphics are provided, as well as scatter plots. Using graphics inside this file, it is possible to visually define if exist downscaling support to use model or forecast data (ISD and ERA5), by the comparison against measure data (IDEAM). See Table @ref(tab:resultsstructure) in Annex Results - Digital Files, for a description of all digital output files of this research.
Procedure to run data comparison using VV_AUT_10 IDEAM variable:
For this procedure, needed R file is VV_AUT_10.r, see Table .
Install R version 3.6.2, RStudio Version 1.2.5033, and 60 R packages dependencies (see chunk of code at beginning of this manual)
Verify files and variables according to descriptions and recommendations of following list (from 1 to 6), then execute file VV_AUT_10.r.
Inside the folder …/downscaling/nonqualitydata/, it is possible to find files to run non-quality data comparison. Main file to run this process is VV_AUT_10.r, and inside it, next list of variables and code need to be configured.
con1 = dbConnect(pg, user="user1", password="user1", host="localhost", port=5432, dbname="winddata")
In variable con1 all PostgreSQL database credentials are defined. Tables to use are isd_all_stations, and ideam_all_stations, corresponding to ISD and IDEAM stations catalogs, and isd_lite_unstack corresponding to ISD unstacked time series. Verify values in con1 according to current database configuration.
stationssample = data.frame(
isd_usaf_id= as.character(
c(803980, 803700, 802110, 802100, 801120, 801100,
800970, 800940, 800630, 800360, 800350, 800280)),
ideam_id = as.character(
c(48015050, 52055230, 26125061, 26125710, 23085270, 27015330,
16015501, 23195502, 13035501, 28025502, 15065180, 29045190)),
fc_zo =
c(1.197230052, 1.219771719, 1.102205474 , 1.18154867, 1.113341504, 1.29241596,
1.102205474, 1.177562503, 1.114968586, 1.184849704, 1.5, 1.224744381),
stringsAsFactors=FALSE)
Inside file VV_AUT_2_1.r, all needed correction factors are hard coded in variable stationssample. If a more suitable calculation of correction factors is done, those values can be updated inside this data-frame.
ncname <- "../data/outfile_nc4c_zip9"
Variable ncname, defines the location of NetCDF file outfile_nc4c_zip9.nc with variable fg10 (3-s wind gust). Be sure that ncname variable is pointing to the right place where mentioned NetCDF file is stored.
fc <-read.xlsx(file="correction_factors_isd_ideam.xlsx", sheetName="fc", header=TRUE,
stringsAsFactors = F)
Variable fc must point to Excel file where correction factors and dependent variables are stored for each analyzed ISD and IDEAM station. This file must have columns Station ID, roughness \(Z_o\), Source, gradient height \(Z_g\), empirical exponent \(\alpha\), exposure coefficient \(K_z\), exposition correction factor \(F_e\), gust correction factor \(F_{gust}\), and total correction factor \(F_{total}\), as is shown in Table @ref(tab:cf). For a detailed explanation, see Data Standardization.
path_vv_aut_10 = "./data/manual/VV_AUT_10/"
Non standardized time series text files for IDEAM data VV_AUT_10 - instantaneous wind velocity each ten (10) minutes, must be stored inside folder path_vv_aut_10, one file per station history. Be aware that all wind velocities must not be standardized. There are a total of 204 stations, that is 204 files. The file name follows the format VV_AUT_10@*.data, where * is replaced by the station identifier, as can be seen below.
VV_AUT_10@31095030.data
VV_AUT_10@29065130.data
VV_AUT_10@29065140.data
...
Content of each file must follow structure shown below. First line must always have next content Fecha|Valor, representing two column names, first column Fecha, has the date time in the format “YYYY-MM-DD HH:MM:SS”, time zone UTC, and second column Valor, must have the non-standardized wind velocity in kilometers per hour. Separation of each column value must be | character.
Fecha|Valor
2008-10-29 10:30:00|0.9
2008-10-29 10:40:00|1.4
2008-10-29 10:50:00|1.2
...
somePDFPath = paste(paste("isdideamera5", i, era5_intersect_ideam["ideam_station",i], sep = "_"),
"pdf", sep=".")
Variable somePDFPath holds the name of the output PDF file with non-quality data comparison between IDEAM, ISD and ERA5. After execution, there will be a file for each match between ERA5 cells and ISD and/or IDEAM intersecting stations. Inside each PDF file, comparison time series graphics will be generated, as well as scatter plots. Using graphics inside this file, it is possible to visually define if exist downscaling support to use model or forecast data (ISD and ERA5), by the comparison against measure data (IDEAM). See Table @ref(tab:resultsstructure) Results - Digital Files, for a description of all digital output files of this research.
Next Table shows specific POT-PP R code used for ISD stations. Complete R code report can be found in Annex @ref(rcode) - Research R Code - Digital Files, Table .
Procedure to run POT-PP in ISD stations:
Install R version 3.6.2, RStudio Version 1.2.5033, and 60 R packages dependencies (see chunk of code at beginning of this manual)
Verify files and variables according to descriptions and recommendations of following list (from 1 to 4), then execute file pot_pp_isd.r.
Files to run POT-PP in ISD stations are inside the folder …/pot_pp/isd/. Main file to run this process is pot_pp_isd.r, and inside it, next list of variables need to be configured.
inputpath="./raw_data/"
Inside this folder inputpath, standardized time series text files must be stored, one file per station history. Be aware that all wind velocities must be already standardized, see Data Standardization section in Methodology. Below, an example of the content of this folder is shown.
raw_data_station_800740.txt
raw_data_station_800770.txt
...
Files names must follow this convention raw_data_station_*.txt, where * must be replaced by the station ID. Its content must follow structure shown below.
"date_time" "kph" "thunder_flag"
"1950/06/22 12:00:00 GMT" 122.60934 "nt"
"1951/06/13 12:00:00 GMT" 203.9521 "nt"
"1951/08/02 12:00:00 GMT" 30.553136 "nt"
"1963/02/26 12:00:00 GMT" 26.585196 "nt"
...
First line must always have next content “date_time” “mph” “thunder_flag”, representing three column names, separated by spaces. First column “date_time”, has the date time in the format “YYYY/MM/DD HH:MM:SS”, time zone UTC. Second column kph, must have the standardized wind velocity in kilometers per hour. Third column “thunder_flag”, must have a flag with one of two possible values “nt” or “t”, representing non-thunderstorm or thunderstorm classification respectively, for the current wind value. For current research, third column always had the flag “nt”, for all rows, all stations. Separation of each column value must be space character. Values corresponding to “date_time” and “thunder_flag” must be enclosed by double quotes.
estaciones <- read.delim(paste0(inputpath, "01 estaciones.txt"),
header = FALSE, sep = "\t")
Variable estaciones, must point to a text file, named ‘01 estaciones.txt’, and stored inside inputpath folder, with the IDs of ISD stations to be processed. Each line of the file must have one ISD station ID. This file must have the list of stations to process, one station per row. If the intention is to run the process for only one station, the content of this file must have a single line, with the corresponding station ID. The code repeats the POT-PP procedure for each line of this file. Below, an example of its content is shown.
800740
800770
...
Be aware that for each line in this file 01 estaciones.txt, one text file with wind historical time series information, named raw_data_station_*.txt, where * represents the same station ID, must be stored inside inputpath folder. See first element, lines up, in this list of variables.
outputpath = "./isd/"
Variable outputpath, should point to the folder where all output files will be stored, after running POT-PP process. The following list describes the main files to be generated, where * will be replaced by correspondent station ID.
Table shows input and output files for ISD stations, after running POT-PP. See Annex @ref(results) - Results - Digital Files, Table , for a complete report of research files.
Main file pot_pp_isd.r, runs supplemental code using the R command source. Be sure that all R code files listed in next chunk of code, are pointing to the right location. See Table with the description of POT-PP ERA5 complementary R files. In the Table of the Annex @ref(rcode) it is possible to see all R files related to this research.
#Library of POT-PP functions, including Dr Adam Pintar R Code (not
#published because this is copyrighted)
source('./code/function_lib.R')
#Raw Data (whole dataset) Statistics and Send to CSV
source('./code/stats_raw_data.r')
#Non Thunderstorm - Create Raw Data Statistics and Send to CSV
source('./code/stats_raw_data_nt.r')
#Thunderstorm - Create Raw Data Statistics and Send to CSV
source('./code/stats_raw_data_t.r')
#Write "t" to csv, but changing to one data per day (the maximum)
#Write "nt" to csv, but changing to one data per day (the maximum)
source('./code/tnt_csv_1perday.r')
#Statistics and graphics for de-clustered non-thunderstorm
source('./code/stats_graphs_dnt.r')
#Statistics and graphics for de-clustered thunderstorm
source('./code/stats_graphs_dt.r')
#Plots for thunderstorm
source('./code/plot_t.r')
#Plots for non-thunderstorm
source('./code/plot_nt.r')
#Plots for non-thunderstorm and thunderstorm
source('./code/plot_t_nt.r')
Main output file of POT-PP analysis is fitted_model_result.xlsx, which contains return levels for typical return periods. Inside this Excel book, sheet pp_pintar, use columns 43 to 53, to create extreme wind maps. Name of those columns are nt_*_poissonprocessintfunc, corresponding to non-thunderstorm return levels using Poisson Process Intensity Function, where * is replaced for 10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000, and 7000 years. A detailed description of all columns of fitted_model_result.xlsx, can be found in Table .
To create non-hurricane maps using Kriging, from columns 43 to 53 of file fitted_model_result.xlsx, use R files inside folder …/code/pot_pp/isd/maps/, see Table . For instance, to create non-hurricane map with return levels for 10 years MRI, use file rl_10_nh.r, for 50 MRI use rl_50_nh.r, and in the same way for other return periods. Once GeoTIFF images have been created with previous R files, use plot_maps.r to plot maps using ggplot2 R package.
Inside files rl_*_nh.r next types of geostatistical related analysis are coded, mainly using gstat, sf, and stars R packages, see (???), (???), and (???) respectively.
Procedure to create non-hurricane maps:
Run the procedure previously described in this manual, to execute POT-PP in ISD stations. File fitted_model_result.xlsx will be created with all return levels for different return periods.
From file fitted_model_result.xlsx, create file rlisd.xlsx with 20 columns according to next Table @ref(tab:rlisd). See Table for columns descriptions. Keep same number of records in rlisd.xlsx compared with fitted_model_result.xlsx.
| Column ID | Column Name | Column ID | Column Name |
|---|---|---|---|
| 1 | id | 1 | id |
| 2 | t_thresh | 2 | t_thresh |
| 3 | t_mu_location | 3 | t_mu_location |
| 4 | t_psi_scale | 4 | t_psi_scale |
| 5 | nt_thresh | 5 | nt_thresh |
| 6 | nt_mu_location | 6 | nt_mu_location |
| 7 | nt_psi_scale | 7 | nt_psi_scale |
| 8 | distance_w | 8 | distance_w |
| 9 | station | 9 | station |
| 43 | nt_10_poissonprocessintfunc | 10 | nt_10_poissonprocessintfunc |
| 44 | nt_20_poissonprocessintfunc | 11 | nt_20_poissonprocessintfunc |
| 45 | nt_50_poissonprocessintfunc | 12 | nt_50_poissonprocessintfunc |
| 46 | nt_100_poissonprocessintfunc | 13 | nt_100_poissonprocessintfunc |
| 47 | nt_250_poissonprocessintfunc | 14 | nt_250_poissonprocessintfunc |
| 48 | nt_500_poissonprocessintfunc | 15 | nt_500_poissonprocessintfunc |
| 49 | nt_700_poissonprocessintfunc | 16 | nt_700_poissonprocessintfunc |
| 50 | nt_1000_poissonprocessintfunc | 17 | nt_1000_poissonprocessintfunc |
| 51 | nt_1700_poissonprocessintfunc | 18 | nt_1700_poissonprocessintfunc |
| 52 | nt_3000_poissonprocessintfunc | 19 | nt_3000_poissonprocessintfunc |
| 53 | nt_7000_poissonprocessintfunc | 20 | nt_7000_poissonprocessintfunc |
| Credential | Value |
|---|---|
| dbname | winddata |
| host | localhost |
| pot | 5432 |
| user | user1 |
| password | user1 |
Be sure that rlisd.xlsx file is stored in same folder as rl_700_nh.r file, this is, …/code/pot_pp/isd/maps/
For desired MRI, run file rl_*_nh.r, where * correspond to one of next MRI years (10, 20, 50, 100, 250, 500, 700, 1000, 1700, 3000), but it is important to take into account, that despite code with multiple geostatistical analysis are inside the file, you as spatial analysis expert, must review and interpret each partial result, and chose the best semivariance model and prediction map to use as final wind map.
Repeat previous step for all typical MRI
Table shows specific POT-PP R code used for ERA5 stations. Complete R code report can be found in Annex @ref(rcode) - Research R Code - Digital Files, Table .
Stations in ERA5 dataset correspond to its cell centers, from ID equal to 1 (top, left cell), counting next cells to the right, then bottom, up to 3381 (bottom, right cell). See Figure @ref(fig:era5stations).
ERA5 Cells and Stations
Procedure to run POT-PP in ERA5 stations:
Install R version 3.6.2, RStudio Version 1.2.5033, and 60 R packages dependencies (see chunk of code at beginning of this manual)
Verify files and variables according to descriptions and recommendations of following list (from 1 to 2), then execute file pot_pp_era5.r.
Files to run POT-PP in ERA5 stations are inside the folder …/pot_pp/era5/. Main file to run this process is pot_pp_era5.r, and inside it, next list of variables need to be configured.
inputpathnetcdf = "./data/"
Inside this folder inputpathnetcdf, file “outfile_nc4c_zip9.nc”, with variable 3-s wind gust fg10, must be stored. Be aware that ERA5 dataset does not need any type of standardization, as it comes standardized from source. See Annex @ref(datadownload) for a detailed procedure to download ERA5 information from Climate Data Storage - CDS https://cds.climate.copernicus.eu/. 2. outputpath
outputpath = "./era5/"
Variable outputpath, should point to the folder where all output files will be stored, after running POT-PP process. Following list describes main files to be generated, where * will be replaced by correspondent station ID.
Table shows input and output files for ERA5 stations, after running POT-PP. See Table in Annex @ref(results) Results - Digital Files for a complete report of research files.
Main file pot_pp_era5.r, runs supplemental code using the R command source. Be sure that all R code files listed in next chunk of code, are pointing to the right location. See Table with the description of POT-PP ERA5 complementary R files. In Annex @ref(rcode) - Research R Code - Digital Files, Table it is possible to see all R files related to this research.
#Library of POT-PP functions, including Dr Adam Pintar R Code (not
#published because this is copyrighted)
source('./code/function_lib.R')
#Raw Data (whole dataset) Statistics and Send to CSV
source('./code/stats_raw_data.r')
#Non Thunderstorm - Create Raw Data Statistics and Send to CSV
source('./code/stats_raw_data_nt.r')
#Thunderstorm - Create Raw Data Statistics and Send to CSV
source('./code/stats_raw_data_t.r')
#Write "t" to csv, but changing to one data per day (the maximum)
#Write "nt" to csv, but changing to one data per day (the maximum)
source('./code/tnt_csv_1perday.r')
#Statistics and graphics for de-clustered non-thunderstorm
source('./code/stats_graphs_dnt.r')
#Statistics and graphics for de-clustered thunderstorm
source('./code/stats_graphs_dt.r')
#Plots for thunderstorm
source('./code/plot_t.r')
#Plots for non-thunderstorm
source('./code/plot_nt.r')
#Plots for non-thunderstorm and thunderstorm
source('./code/plot_t_nt.r')